Download MATLAB Toolbox for the LabelMe Image Database

The LabelMe Matlab toolbox is designed to allow you to download and interact with the images and annotations in the LabelMe database. The toolbox contains functions for plotting and querying the annotations, computing statistics, dealing with synonyms, etc. This page gives a step-by-step overview of the main toolbox functionalities.

Download

There are two ways to download the Matlab toolbox:

1. Github repository

We maintain the latest version of the toolbox on github. To pull the latest version, make sure that "git" is installed on your machine and then run "git clone https://github.com/CSAILVision/LabelMeToolbox.git" on the command line. You can refresh your copy to the latest version by running "git pull" from inside the project directory.

2. Zip file

The zip file is a snapshot of the latest source code on github.

Citation

If you use this dataset, the annotation tool, or the functions on this toolbox, we would appreciate if you cite:

B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman,
LabelMe: a database and web-based tool for image annotation.
International Journal of Computer Vision, pages 157-173, Volume 77, Numbers 1-3, May, 2008. (paper.pdf)

To read more about the difficulties of image annotation:

A. Barriuso and A. Torralba.
Notes on image annotation.
arXiv:1210.3448 [cs.CV] October, 2012. (paper.pdf)

Contribution

If you find this dataset useful, you can help us to make it larger by visiting the annotation tool and labeling several objects. Even if your contribution seems small compared to the size of the dataset, everything counts! We also welcome submissions of copyright free images. Your annotations and images will be made available for download inmediately.

Toolbox description

A quick look into the dataset

The toolbox allows you using the dataset online without needing to download it first. Just execute the next lines to visualize the content of one of the folders of the collection:

HOMEANNOTATIONS = 'http://labelme.csail.mit.edu/Annotations';
HOMEIMAGES = 'http://labelme.csail.mit.edu/Images';
D = LMdatabase(HOMEANNOTATIONS, {'static_street_statacenter_cambridge_outdoor_2005'});
LMdbshowscenes(D, HOMEIMAGES);

This example, reads the images online. Installing a local copy of the database will allow you having faster access to the images and annotations and it will reduce the load on our server.

Downloading the LabelMe database

To download the images and annotations you can use the function LMinstall:

HOMEIMAGES = '/desired/path/to/Images';
HOMEANNOTATIONS = '/desired/path/to/Annotations';
LMinstall (HOMEIMAGES, HOMEANNOTATIONS);

Set the variables HOMEIMAGES and HOMEANNOTATIONS to point to your local paths. Downloading the entire LabelMe database can be quite slow. For additional download options, follow the instructions here.

Reading the index

The annotation files use XML format. The function LMdatabase.m reads the XML files and generates a matlab struct array that will be used to perform queries and to extract segmentations from the images. To build the index for the entire dataset, execute:

D = LMdatabase(HOMEANNOTATIONS);

D is an array with as many entries as there are annotated images. For image n, some of the fields are:

D(n).annotation.folder
D(n).annotation.filename
D(n).annotation.imagesize
D(n).annotation.object(m).name
D(n).annotation.object(m).id
D(n).annotation.object(m).username
D(n).annotation.object(m).polygon

Where n and m are the image and object indices respectively. Type help LMdatabase to see how to build the index for only some folders of the dataset.

Visualization

Once you have created the LabelMe index D, you can visualize the annotations for one image with the function LMplot:

LMplot(D, 1, HOMEIMAGES);

You can also visualize a set of images or object crops:

LMdbshowscenes(D(1:30), HOMEIMAGES); % shows the first 30 images
LMdbshowobjects(D(1), HOMEIMAGES); % shows crops of all the objects in the first image

Object names and attributes

For each object, we store the object name in the field:

D(n).annotation.object(m).name

and the object attributes

D(n).annotation.object(m).attributes
D(n).annotation.object(m).occluded

Object parts

Part relationships are stored as a tree. If parts are annotated, the object will have two fields

D(n).annotation.object(m).parts.ispartof
D(n).annotation.object(m).parts.hasparts

ispartof: contains the id of the parent object.

hasparts: comma separated list of the ids of the polygons that are parts of the current object.

Both fields provide redundant information, but there are there to simplify exploring the part tree. For instance, if ispartof field is empty, this indicates that the current object is not a part.

Collecting annotation statistics

The database contains many different object names. In order to see the list of object names and the number of times each object appears, you can use the function LMobjectnames which shows the distribution of object names when there are no output arguments:

LMobjectnames(D);

You can also get the list of object names and counts:

[names, counts] = LMobjectnames(D);

You can also get the list of attributes:

LMobjectnames(D, 'attributes');

Queries

To perfom searches for images, scenes, objects, attributes, etc, you can use the function LMquery. This function allows searching for the content of any field.

[Dcar, j] = LMquery(D, 'object.name', 'car');

The new struct Dcar contains all the images with cars and all other objects have been removed. The index array j points to the original index D. The struct D(j) contains all the images with cars without excluding other objects.

The LMquery function does not assume a predefined list of fields. You can use this function to query with respect to any field. Therefore, if you add new fields inside the XML annotation files, you can still use LMquery to search with respect to the content of the new fields.

For instance, if you want to search for all the objects annotated with the attribute 'red', you can user LMquery:

Dred = LMquery(D, 'object.attributes', 'red', 'word');

Note the fourth argument 'word'. The fourth argument specifies the method used for matching. There are three methods (help LMquery will give you the full description):

[] = default = substring matching (air => chair)
'exact' = strings should match exactly
'word'  = the field should match words (air ~=> chair)

Exclusion can be used to narrow down a search. Compare this two:

LMdbshowobjects(LMquery(D, 'object.name', 'mouse+pad'), HOMEIMAGES);
LMdbshowobjects(LMquery(D, 'object.name', 'mouse-pad'), HOMEIMAGES);

You can also combine searches. This next line select objects that belong to one of this groups: 1) side views of cars, 2) buildings 3) roads or 4) trees:

[Dcbrt, j] = LMquery(D, 'object.name', 'car+side,building,road,tree');

You can also do AND combinations by using several queries. For instace, to get a list of images that contain buildings, side views of cars and trees you can do:

[D1,j1] = LMquery(D, 'object.name', 'building');
[D2,j2] = LMquery(D, 'object.name', 'car+side');
[D3,j3] = LMquery(D, 'object.name', 'tree');
j = intersect(intersect(j1,j2),j3);

The index array j points to all the images containing the three objects. Note that D(j) will also contain other objects, but it is guaranteed to contain the previous three.

Extracting polygons and segments

The toolbox provides with a set of generic function to extract polygons and segments from the annotations. To extract the polygon coordinates for one object, you can use:

[x,y] = LMobjectpolygon(Dcar(1).annotation, 1);
figure
plot(x{1}, y{1}, 'r')
axis('ij')

In this case, the function returns the first polygon of the first image in the index. The function LMobjectpolygon returns a cell array. One entry for each polygon requested.

To extract segmentation masks you can use the function LMobjectmask:

[mask, class] = LMobjectmask(D(1).annotation, HOMEIMAGES);
imshow(colorSegments(mask))

You can use this function to extract segmentation masks for all the objects that belong to a single category or for individual polygons. Do help LMobjectmask to see more examples.

Here is a summary of the function available:

LMobjectpolygon - returns all the polygons for an object class within an image
LMobjectboundingbox - returns bounding boxes
LMobjectmask - returns the segmentation mask for all object instances of one class within an image
LMobjectcrop - crops one selected object
LMobjectnormalizedcrop - crops one image into a normalized frame (as we needed for training detectors)

The function LM2segments.m transforms all the annotations into segmentation masks and provides a unique index for each object class.

Image manipulation

To crop and resize images and annotations:

LMimscale - scales an image and the corresponding annotation
LMimcrop - crops an image and the corresponding annotation
LMimpad - pads an image with PADVAL and modifies the annotation
LMimresizecrop - outputs an image of size MxM.

Dealing with synonyms and labeling noise

As there are not specific instructions about how labels should be introduced when using the online annotation tool, this results in different text descriptions used for the same object category. For instace, a person can be described as a "person", "pedestrian", "person walking", "kid", etc. Therefore, it is important that you unify the annotations. The way the annotations can be unified will depend on what you want to do. Therefore, here we provide a set of tools to replace object names.

LMreplaceobjectname

This function is useful when you want to replace a few object names. In order to replace an object name, you can use the function LMreplaceobjectname. For instance, the next line replaces all the object names that contain the string 'person' or 'pedestrian' by the string 'person'.

D = LMreplaceobjectname(D, 'person,pedestrian', 'person', 'rename');

Type help LMreplaceobjectname to see other options.

LMaddtags

The function LMaddtags replaces LabelMe object descriptions with the names in the list tags.txt. You can extend this list to include more synonyms. Use this function to reduce the variability on object labels used to describe the same object class. However, the original labelme descriptions contain information that is more specific and you might want to generate other tag files to account for a specific level of description. Details on the structure of the text file is given bellow.

To call the function:

tagsfilename = 'tags.txt';
[D, unmatched] = LMaddtags(D, tagsfilename);

After running this line, the struct D will contain a unified list of objects. The variable 'unmatched' gives the list of labelme descriptions that were not found inside tags.txt. The file tags.txt contains a list of tags and the labelme descriptions that will get map to each tag. You can add more terms to tags.txt. For instance, the next lines will unify a few of the descriptions into the tags 'person' and 'car'. (lmd means LabelMe Description).

TAG: person
lmd: person walking
lmd: person
lmd: person standing
lmd: person occluded
TAG: car
lmd: car
lmd: car occluded
lmd: suv

LMaddwordnet

Another way of unifiying the annotations is using Wordnet. You can see a demo in the script: demoWordnet.m.

sensesfile = 'wordnetsenses.txt'; % this file contains the list of wordnet synsets.
[D, unmatched, counts] = LMaddwordnet(D, sensesfile);

We can now use the power of wordnet to do more than unify the annotations. We can extend the annotations by including other terms. For instance, you can explore the Wordnet tree here. The online search tool uses wordnet to extent the annotations. For instance, we can search for animals (query = animal) despide that users rarely provided this label.

Annotate your own images

The function LMphotoalbum creates a web page with thumbnails connected with the annotation tool online. You can use this function to create a page showing images to annotate. This is useful if you want other people to help you, you can create one page for each person with a different partition of the set of images that you want to label.

LMphotoalbum(folderlist, filelist, webpagename, HOMEIMAGES);

For instance, if you want to create a web page with images of kitchens, you can do:

D = LMquery(D, 'folder', 'kitchen');
LMphotoalbum(D, 'myphotoalbum.html');

If you want to annotate your own images, you need to upload them to LabelMe first. If you have a set of images you can send us an email with a link to a file with all your images. We will create a folder in LabelMe with your images.

The pictures that you upload, along with the annotations that you provide, will be made available for computer vision research as part of the LabelMe database.

Scene recognition

Gist descriptor

Here we provide a function to compute the gist descriptor as described in: Aude Oliva, Antonio Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, Vol. 42(3): 145-175, 2001. To compute the gist descriptor on an image you can use the function LMgist. Here is one example that reads one image and computes the descriptor.

% Load image
img = imread('demo1.jpg');

% Parameters:
param.imageSize = 128;
param.orientationsPerScale = [8 8 8 8];
param.numberBlocks = 4;
param.fc_prefilt = 4;

% Computing gist:
[gist, param] = LMgist(img, '', param);

% Visualization
figure
subplot(121)
imshow(img)
title('Input image')
subplot(122)
title('Descriptor')
showGist(gist, param)

You can also compute the gist for a collection of images:

gist = LMgist(D, HOMEIMAGES, param);

The output is an array of size [Nscenes Nfeatures], where Nscenes = length(D).

Estimation of the horizon line using the Gist descriptor

The goal is to estimate the location of the horizon line on an image. This function uses the approach described in:

A. Torralba, P. Sinha. Statistical context priming for object detection. ICCV 2001.
D. Hoiem, Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding, CMU doctoral dissertation 2007.
J. Sivic, B. Kaneva, A. Torralba, S. Avidan and W. T. Freeman. Creating and exploring a large photorealistic virtual space. Workshop on Internet Vision, 2008.

To estimate the location of the horizon line call the function getHorizon.m. This function returns the location of the horizon line. The units represent the distance to the center of the image (normalized units with respect to image height):

h = getHorizon(img); % h is a value in the range [-0.5, 0.5]
        
[nrows ncols cc] = size(img);

figure
imshow(img)
hold on
plot([1 ncols], ([h h]+.5)*nrows, 'b', 'linewidth',3);

The estimator has already been trained using street scenes. The parameters of the estimator are in the file streets_general_camera_parameters.mat

If you want to retrain the estimator you can use the script: trainHorizon.m The training data is stored in the file streets_general_camera_training.mat. Inside that file, the variable 'hor' contains all the training data and the list of LabelMe images used for training.

SIFT descriptor

Here we provide a function to compute dense SIFT features as described in:

S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR 2006.
C. Liu, J. Yuen, A. Torralba. Dense scene alignment using SIFT Flow for object recognition. CVPR 2009.

The function LMdenseSift.m computes a SIFT descriptor at each pixel location (in this implementation there is no ROI detection as in the original definition by D. Lowe). This function is a modification of the code provided by S. Lazebnik. The current implementation uses convolutions. Here there is an example of how to compute the dense SIFT descriptors for an image and to visualize the descriptors as described in Liu et al 09.

% demo SIFT using LabelMe toolbox

img = imread('demo1.jpg');
img = imresize(img, .5, 'bilinear');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% SIFT parameters:
SIFTparam.grid_spacing = 1; % distance between grid centers
SIFTparam.patch_size = 16; % size of patch from which to compute SIFT descriptor (it has to be a factor of 4)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% CONSTANTS (you can not change this)
w = SIFTparam.patch_size/2; % boundary

% COMPUTE SIFT: the output is a matrix [nrows x ncols x 128]
SIFT = LMdenseSift(img, '', SIFTparam);

figure
subplot(121)
imshow(img(w:end-w+1,w:end-w+1,:))
title('cropped image')
subplot(122)
showColorSIFT(SIFT)
title('SIFT color coded')

Other related functions: demoVisualWords.m, LMkmeansVisualWords.m, LMdenseVisualWords.m