›› MIND Lab Seminar Series: 22 October 2007

Data Genome : a model for Data Evolution

Grace Liu
Ph.D. Student in Computer Science, South China University of Technology, China

›› Logistics

Date: Monday, 29 October 2007

Time: 2:00p to 3:00p

Location: AVW 3258

›› Abstract

Modern information systems often process data that has been transferred, transformed or integrated from a variety of sources. In many application domains, information concerning the derivation of data items is crucial. Currently, a kind of metadata called data provenance is investigated by many researchers, but collection of provenance information must be maintained explicitly by dataset maintainer or specialized provenance management system. So we investigate the problem of providing support of derivation information for applications in dataset itself. We put forward that every dataset has a unique data genome evolving with the evolution of dataset. Data genome is part of data and records derivation information for data actively. The characteristics of data genome show that the lineage of datasets can be uncovered by analyzing theirs data genomes. We also present computations of data genomes such as clone, transmit, mutate and introject to show how data genome evolves to provide derivation information from dataset itself.