Data Description
The NTCIR-13 Lifelog data consists of at least 45 days of data from two active lifeloggers. The full phase-2 dataset contains the following data:
Multimedia
- Narrative Clip 2. Set at 45 second interval.. From breakfast to sleep. This is about 1,500 images per day. There is an accompanying output of a concept detector to assist teams in building a search engine for the data.
- Music listing history (see an example of music listening history here)
Biometrics
- Biometrics 24x7 (heart rate, galvanic skin response, calorie burn, steps)
- Blood Pressure daily, in the morning after preparing (but before eating) breakfast and before exercising
- Blood Sugar levels every morning after waking up, before eating.
Human Activity
- Semantic locations visited
- Physical activities
- Daily mood, according to Thayers 2 dimensional model of mood
- Diet log (manual logging of photos of food).
Computer Usage (as document vectors on a per-minute basis)
- Computer input via keyboard and information consumed on the computer via ASR of on-screen activity on a per-minute basis. This data is filtered using a blacklist, anonymised and then stemmed using an English language stemmer. Each minute is represented by a sorted document vector.
Baseline Search Engine
In order to assist participating groups, a baseline search engine has been developed by the organisers. It is accessible here (http://search-lifelog.computing.dcu.ie/) and can be used to provide basic queries to the system. Queries can be submitted that filter images by userID, location and/or physical activity. Visual concept search will be added on 30th June.
Registration and Data Release Forms
Every participating group must firstly register with NTCIR and indicate their intention to partake in the lifelog task. This can be done by following this link registering for the Lifelog task at NTCIR-13.
Once this registration with NTCIR is completed, the NTCIR-Lifelog's participants are required to sign two forms to access the datasets, an organisational agreement form for your organisation (signed by the research team leader) and an individual agreement form for each member of the research team that will access the data. The organisation agreement formhttp://ntcir-lifelog.computing.dcu.ie/data/NTCIR13-lifelog-concepts.zip should be sent to the lifelog task organisers (ntcir-lifelog@computing.dcu.ie) in PDF format. The individual agreement form must be signed by all researchers who will use the data and kept by the organisation on file. It should not be sent to the organisers, unless requested at a later date.
- Organisation Agreement form: to be signed by the organisation to which the participants belong. This form must be signed and sent by email to NTCIR-Lifelog organisers (ntcir-lifelog@computing.dcu.ie).
- Individual Agreement form: to be signed by each individual researcher wishing to use the NTCIR-Lifelog data collection. This form must be filed by the participating organisation, but it does not need to be sent to the lifelog organisers.
Upon completion of this process, the participants will be sent a unique username and password to access the dataset. Please see the section below.
Access to the LifeLog datasets
The datasets can be downloaded below. Each link is password protected and each organisation will receive a unique username and password to access the data. To get these access codes, please email the organisers (ntcir-lifelog@computing.dcu.ie) with the signed organisation agreement form in attachment.
- DryRun Phase-1 Lifelog-2 dataset: http://ntcir-lifelog.computing.dcu.ie/data/NTCIR13-lifelog2-phase1-dryrun-dataset.zip (229MB). There are also a set of five topics that are selected as dry-run topics for this ontology. This is a password protected ZIP file. Please contact the organisers if you don’t have the ZIP file password.
- Full Phase-1 Lifelog-2 dataset (32 days of image data with biometrics, activities and locations): http://ntcir-lifelog.computing.dcu.ie/data/NTCIR13-lifelog2-phase1-images.zip (6.5GB). The associated ontology of lifelogging concepts. Fifteen concepts have been chosen from this ontology as topics. These topics are available from the NTCIR-13 Lifelog site, hosted by NII. This dataset is used with the LAT sub-task only. This is a password protected ZIP file. Please contact the organisers if you don’t have the ZIP file password.
- Full Phase-2 Lifelog-2 dataset (90 days of image data with biometrics, activities, locations, information accesses, semantic annotations). http://ntcir-lifelog.computing.dcu.ie/data/NTCIR13-lifelog2-phase2-images.zip (25.5GB) and the associated metadata http://ntcir-lifelog.computing.dcu.ie/data/NTCIR13-lifelog2-phase2-metadata.zip. Five Dry-Run Topics are available for this dataset. This dataset is used for the LSAT, LIT and LEST sub-tasks. You may wish to download the visual concept annotations (zipped CSV format) for u1 and u2, which are based on this visual concept list and have been generated using the Computer Vision API from MS. This concept list is in the format [imageID, concept]. Information access data for u1 is available, though will not form part of the queries for NTCIR-13. The zip files are encoded on MacOS X and have the standard password protection applied (if not sure, ask the organisers). We used to support a baseline search interface which could be used to provide basic queries upon which participants could build search and browsing tools. This is no longer available
Format of the NTCIR-13 LifeLog datasets
The root of the ZIP files contains an .xml file, which is a simple aggregation of all users data. It is structured as follows:
The root node of the data is the USERS tag. Each user element contains all the data of that user (u1 or u2). Each user has a tag USER that contains the user ID as an attribute, example: [user id="u1”]. Inside the USER element, is his/her data:
Following that there is a tag DAYS, this tag contains the lifelogging information of that user organised per day, each day is included in a tag DAY that has the data (a tag DATA), the relative path to the directory that contains the images captured in that particular day (the tag IMAGES-DIRECTORY), then the minutes of of that day under a root tag called MINUTES.
At the start of each day there is a set of daily metatdata for that user. This data is of three forms; BIOMETRICS, ACTIVITIES & PERSONAL LOGS. The biometrics contains WEIGHT, FAT MASS, HEART RATE, SYSTOLIC blood pressure & DIASTOLIC blood pressure, which were readings taken after waking up each day. The activities contains summary activities: STEPS taken that day, DISTANCE walked in metres that day & ELEVATION climbed in metres that day. The personal logs contain HEALTH LOGS, including the TIME of reading, GLU Glucose levels in the blood, BP Blood Pressure, HR Heart Rate, MOOD manually logged every morning and sometimes a COMMENT, as well as DRINK LOGS and FOOD LOGS which were manually logged throughout the dat.
Following that, the day’s data is organised into minutes. The MINUTES element, contains exactly 1440 child elements (called MNUTE), each child has an ID (example: [minute id=“0”], [minute id=“1”], [minute id=“2”]... etc), and it represent one minute in the day ordered from 0 = 12:00 AM, to 1439 = 23:59PM.
Each minute contains: 0 or 1 location information (LOCATION tag), 0 or one activity information (ACTIVITY tag), biometrics, 0 or more captured images (IMAGES tag with IMAGE child element (each element has has a relative path to the image and a unique image ID), and 0 or 1 MUSIC tag giving details of the music listened to at that point in time.
-The location information is captured by Moves app (https://www.moves-app.com/), and they represent to semantic locations (Home, Work, DCU Computing building, GYM, Name of a Store, etc…), or to landmark locations registered by Moves. This tag can contain information in several languages. For locations that are not (HOME) or (WORK), the GPS locations are provided.
An example of the XML file is for one minute is provided, along with examples of the daily metadata for u1 (user 1).