DATA | LSC'20 at ICMR'20

LSC'20 is a new multimodal dataset that is four months in size, from one active lifelogger. The dataset is based on previously developed NTCIR Lifelog datasets, but merging together datasets from 2015, 2016 and 2018. The dataset consists of three files, each of which is password protected:

Core Image Dataset (38.49GB) of wearable camera images, fully redacted in 1024 x 768 resolution, captured using OMG Autographer and Narrative Clip devices. These images were collected during periods in 2015, 2016 and 2018. The images are anonymised which means that faces are blurred. Updated 10 March 2020 to resize 1 day of full size videos and removal of unnecessary metadata folders. For access to the image collection, please email cathal.gurrin at dcu.ie
Metadata for the collection (2.8MB), consisting of textual metadata representing time, physical activities, biometrics, locations, etc… Please note that there is currently no HR biometrics for the 2015 data. Updated 12 Feb 2020.
Visual Concepts (79.9MB) extracted from the non-redacted version of the visual dataset. Updated 06 March 2020 to include missing images, bounding boxes for objects and fix a mismatch in the UTC time mapping in the 2016 data.

The Visual Concepts data file includes detected scenes and concepts for each image (processed over the non-redacted version of the images). The format of the descriptor for each image is as follows:
- attribute_top{i} : the attribute of the scene detected automatically from the image.
- category_top{i} : the category of the scene detected automatically from the image.
- category_top{i}_score : the confidence score of the scene prediction output.
- concept_class{i} : the objects detected automatically from the image. We use the object category list of 2014-2017 COCO datasets with 80 labels
- concept_score_top{i}: the confidence score of the object detection output.
- concept_bbox_top{i}: the bounding box of the detected object in the format of {top_x top_y bottom_x bottom_y}.

LSC'20 Data Release Forms

Participants are required to sign two forms to access the datasets, an organisational agreement form for your organisation (signed by the research team leader) and an individual agreement form for each member of the research team that will access the data. The organisation agreement form should be sent to the LSC organisers (lsc@computing.dcu.ie) in PDF format. The individual agreement form must be signed by all researchers who will use the data and kept by the organisation on file. It should not be sent to the organisers, unless requested at a later date.

Organisation Agreement form: to be signed by the organisation to which the participants belong. This form must be signed and sent by email to LSC organisers (lsc@computing.dcu.ie).
Individual Agreement form: to be signed by each individual researcher wishing to use the LSC data collection. This form must be filed by the participating organisation, but it does not need to be sent to the organisers.

Upon completion of this process, the participants will be sent details about how to access the dataset. Please note that the zip file is also password protected.

A suitable reference for the dataset in LSC and subsequent papers is as follows:

@inproceedings{LSC20,
address = {Dublin, Ireland},
author = {Gurrin, Cathal and Le, Tu-Khiem and Ninh, Van-Tu and Dang-Nguyen, Duc-Tien and Jónsson, Björn \THór and Lokoč, Jakub and Hurst, Wolfgang and Tran, Minh-Triet and Schoeffmann, Klaus},
booktitle = {ICMR '20, The 2020 International Conference on Multimedia Retrieval},
publisher = {ACM},
title = {{An Introduction to the Third Annual Lifelog Search Challenge, LSC'20}},
year = {2020}
}

LSC'20 Development Topics

The suite of development topics will be available to assist teams in developing their lifelog search engines. The LSC'19 topics are available for system testing. These have been developed for the 2016 subset of the dataset and the relevance judgments are provided only from the 2016 data.
Associated with these development topics, there will be an evaluation system that allows teams to input image IDs and receive a score depending on submission accuracy, which will be operational soon.