ImageCLEF - Photo Annotation

The steadily increasing amount of multimedia data poses challenging questions on how to index, visualise, organise, navigate, or structure multimedia information. The benefit of different automated approaches is often not clear, as usually they are evaluated on different datasets with different performance measures. Evaluation campaigns aim at establishing an objective comparison between the performance of different approaches by posing well-defined tasks including datasets, topics, and measures.

The Visual Concept Detection and Annotation Task (VCDT) is part of the evaluation initiative »ImageCLEF« and focuses on the evaluation of multi-modal image annotation approaches in consumer photos with respect to user requirements on photo collection organisation. The task challenges the participants to deal with an unbalanced number of annotations per image, an unbalanced number of images per concept, the subjectivity of concepts such as boring, cute, or fancy, and the diversity of images belonging to the same concept. Additionally, textual and multi-modal runs have to cope with images lacking EXIF data and/or Flickr user tags. The VCDT was organised within the research programme THESEUS in the ImageCLEF evaluation cycles of 2009, 2010, and 2011 and in the ImageCLEF@ICPR 2010 contest by Fraunhofer IDMT. 

The test collections of the different VCDT cycles are now available for download for research purposes. The photos used in the benchmark are a subset of the MIRFlickr collection which comprises 25,000 photos from that were published under the Creative Commons license. The creator information as well as the exact license type and image title is distributed together with the images. Further, the dataset includes Flickr user tags and EXIF information. Many thanks to the University of Leiden for collecting and providing this dataset. The set used for the VCDT in the ImageCLEF cycles contains 18,000 Flickr photos in total. In all evaluation cycles, the VCDT test collection was fully assessed with relevance judgements for the annotation task. The assignment of the ImageCLEF IDs to the MIRFlickr IDs can be found here.

If you use the VCDT test collections in your work, please cite the overview paper of the corresponding evaluation cycle.

VCDT in ImageCLEF 2009

The VCDT 2009 poses the challenge of multi-label classification in consumer photos with the help of ontology knowledge. The participants were provided with a training set of annotated Flickr photos and asked to automatically annotate a test set with a number of visual concepts. All photos were manually assessed with 53 visual concepts and are organised in the Photo Tagging Ontology (PTO). This textual information was available to enhance the visual analysis algorithms and to validate the output of the classifiers.

The VCDT 2009 mainly addresses two issues:

1. Can image classifiers scale to the large amount of concepts and data?

2. Can an ontology (hierarchy and relations) help in large scale annotations?

If you use the VCDT test collection 2009 in your work, please cite the overview paper of the 2009 evaluation cycle.

VCDT at the ImageCLEF@ICPR contest

In 2010, ImageCLEF performed an additional benchmarking event at the ICPR - International Conference on Pattern Recognition. At the ICPR contest, the VCDT posed a similar problem as in ImageCLEF 2009. Again, participants were asked to automatically annotate a set of Flickr photos with visual concepts. This time, an additional validation set was provided and the number of photos in test and training set changed in contrast to ImageCLEF 2009. However, the research challenge remained the same.

If you use the VCDT test collection @ICPR 2010 in your work, please cite the overview paper of the @ICPR 2010 evaluation cycle.

VCDT in ImageCLEF 2010

In ImageCLEF 2010, the VCDT included user-generated metadata as additional textual resource and a differentiation was made between three configurations:

1. Automatic annotation with content-based visual information of the images.

2. Automatic annotation with Flickr user tags and EXIF metadata in a purely textual scenario.

3. Multi-modal approaches that consider both visual and textual information, such as Flickr user tags and EXIF information.

In all cases, the participants of the task were asked to annotate the images of the test set with a predefined set of concepts, allowing for an automated evaluation and comparison of the different approaches. The number of visual concepts was substantially extended to 93 concepts. 52 of the 53 former concepts were reused. In contrast to the previous annotations, the new annotations were obtained with a crowdsourcing approach that utilises Amazon Mechanical Turk (MTurk).

The focus of the task lies on the comparison of the strengths and limitations of the different approaches:

  • Do multi-modal approaches outperform text-only or visual-only approaches?
  • Which approaches are best for which kind of concepts?
  • Can image classifiers scale to the large number of concepts and data?

If you use the VCDT test collection 2010 in your work, please cite the overview paper of the 2010 evaluation cycle.

VCDT in ImageCLEF 2011

The evaluation objective in 2011 lies in the automated detection of 99 visual concepts, including sentiments, and in a concept-based retrieval scenario on 200,000 Flickr images. Both the sentiment annotation and the concept-based retrieval may consider visual, textual, or multi-modal information. The concept-based retrieval task focuses on the retrieval according to 40 different topics which are based on query logs. The retrieval can be solved with a logical connection of concepts of the PTO. Relevance assessment is performed with MTurk and uses pooling for the concept-based retrieval task and a fully assessed test collection for the annotation task.

If you use the VCDT test collection 2011 in your work, please cite the overview paper of the 2011 evaluation cycle.

This might also be interesting for you

Research topic

Audio and Visual Content Analysis

Extracting meaningful data from audiovisual content

Research topic

Video analysis

Image and video analysis with a focus on facial analysis