›› MIND Lab Seminar Series: 22 October 2007
Data Genome : a model for Data Evolution
Grace Liu
Ph.D. Student in Computer Science, South China University of Technology, China
›› Logistics
Date: Monday, 29 October 2007
Time: 2:00p to 3:00p
Location: AVW 3258
›› Abstract
Modern information systems often process data that has been transferred, transformed or integrated from a variety of sources. In many application domains, information concerning the derivation of data items is crucial. Currently, a kind of metadata called data provenance is investigated by many researchers, but collection of provenance information must be maintained explicitly by dataset maintainer or specialized provenance management system. So we investigate the problem of providing support of derivation information for applications in dataset itself. We put forward that every dataset has a unique data genome evolving with the evolution of dataset. Data genome is part of data and records derivation information for data actively. The characteristics of data genome show that the lineage of datasets can be uncovered by analyzing theirs data genomes. We also present computations of data genomes such as clone, transmit, mutate and introject to show how data genome evolves to provide derivation information from dataset itself.