In the development of computer vision applications, a fundamental role is played by the availability of large datasets of annotated images and videos (ground truth) providing a wide coverage of different scenarios and environments. These are used both to train machine-learning approaches, which have been largely and successfully adopted for computer vision, but still strongly suffer the lack of comprehensive, large-scale training data, and to evaluate algorithms’ performance, which has to provide enough evidence, to the developers and especially to peer scientists reviewing the work, that a method works well in the targeted environment and conditions.
The main limitation to collect large scale ground truth is the daunting amount of time and human effort needed to generate high quality ground truth; in fact, it has been estimated that labelling an image may take from two to thirty minutes, depending on the task, and this is, obviously, even worse in the case of videos.
Currently, most available datasets with the related ground truth are produced as the result of efforts of single research groups who have manually annotated such datasets, which, however, are too task-oriented and cannot be generalized.
Moreover, the large-scale ground truth gathering approaches, which have been experimented so far, suffer from many limitations, from incomplete or low-quality annotations (due to the lack of quality control) to interoperability issues, since no common representation schema has been adopted yet.
In addition, it is not always trivial to identify metrics for performance evaluation. A notable case is object tracking, for which some research groups have developed self-evaluation-based approaches. Therefore, the availability of massive ground truth would allow the development of such methods and make them in the long run independent of ground truth; this would be inline with the current wave of scientific development, which is “data-driven” in contrast to theory or simulation driven.
This workshop follows the successful VIGTA 2012 workshop workshop, which was organized as part of ACM AVI 2012. However, VIGTA’12 was focused mainly on user interfaces to support the ground truth generation task, while this new edition has a broader range of topics ranging from ground truth generation to performance evaluation to semantic web for unifying the existing efforts.
Research topics of interest for this workshop include, but are not limited to: