DATA | LSC'21 at ICMR'21

LSC'21 reuses the LSC'20 dataset, which is a multimodal dataset that is four months in size, from one active lifelogger. The dataset is based on previously developed NTCIR Lifelog datasets, but merging together datasets from 2015, 2016 and 2018. The dataset consists of three files, each of which is password protected:

Core Image Dataset (38.49GB) of wearable camera images, fully redacted and anonymised in 1024 x 768 resolution, captured using OMG Autographer and Narrative Clip devices. These images were collected during periods in 2015, 2016 and 2018. All faces and readable text have been removed, as well as certain scenes and activities manually filtered out to respect local privacy requirements.
Metadata for the collection (2.8MB), consisting of textual metadata representing time, physical activities, biometrics, locations, etc… Please note that there are no HR biometrics for the 2015 data.
Visual Concepts (79.9MB) extracted from the non-redacted version of the visual dataset. Please note that there are four images in the visual concepts that are not in the Core Image Dataset. This is not a mistake; those four images are no longer in the LSC collection.
Donated data: The MySceal team have donated the Microsoft Cognitive Services image annotation outputs for the LSC'20 collection.

For access to the full dataset, please email cathal dot gurrin at dcu.ie

The Visual Concepts data file includes detected scenes and concepts for each image (processed over the non-redacted version of the images). The format of the descriptor for each image is as follows:
- attribute_top{i} : the attribute of the scene detected automatically from the image.
- category_top{i} : the category of the scene detected automatically from the image.
- category_top{i}_score : the confidence score of the scene prediction output.
- concept_class{i} : the objects detected automatically from the image. We use the object category list of 2014-2017 COCO datasets with 80 labels
- concept_score_top{i}: the confidence score of the object detection output.
- concept_bbox_top{i}: the bounding box of the detected object in the format of {top_x top_y bottom_x bottom_y}.

LSC'21 Data Release Forms

Participants are required to sign two forms to access the datasets, an organisational agreement form for your organisation (signed by the research team leader) and an individual agreement form for each member of the research team that will access the data. The organisation agreement form should be sent to the LSC organisers (lsc@computing.dcu.ie) in PDF format. The individual agreement form must be signed by all researchers who will use the data and kept by the organisation on file. It should not be sent to the organisers, unless requested at a later date.

Organisation Agreement form: to be signed by the organisation to which the participants belong. This form must be signed and sent by email to LSC organisers (lsc@computing.dcu.ie).
Individual Agreement form: to be signed by each individual researcher wishing to use the LSC data collection. This form must be filed by the participating organisation, but it does not need to be sent to the organisers.

Upon completion of this process, the participants will be sent details about how to access the dataset. Please note that the zip file is also password protected.

A suitable reference for the dataset in LSC and subsequent papers is as follows:

@inproceedings{LSC21,
author = {Cathal Gurrin and Björn Þór Jónsson and Klaus Schöffmann and Duc-Tien Dang-Nguyen and Jakub Lokoč and Minh-Triet Tran and Wolfgang Hürst and Luca Rossetto and Graham Healy},
title = {Introduction to the Fourth Annual Lifelog Search Challenge, LSC’21},
booktitle = {Proc. International Conference on Multimedia Retrieval (ICMR’21)},
publisher = {ACM},
address = {Taipei, Taiwan},
year = 2021,
}

LSC'20 Development Topics

The suite of development topics will be available to assist teams in developing their lifelog search engines. The LSC'19 topics are available for system testing as well as the LSC'20 topics with relevance judgements. These have been developed for the 2016 subset of the dataset and the relevance judgments are provided only from the 2016 data.
Associated with these development topics, there will be an evaluation system that allows teams to input image IDs and receive a score depending on submission accuracy, which will be operational soon.