iData Logo

Home of the iData Project

Project Log

The iData Project

The Great Idea

Today, information technology envolves very quickly. The annual performance increase of hardware is incredible and has led to applications such as real-time video streaming, VoIP and on-the-fly video compression. Most of the efforts of hardware engineers are focussed on performance tuning and memory space enlargement. Software engineers take advantage of these improvements and implement sophisticated applications that put very demanding tasks such as video mastering and digital image processing and the reach of everyone. However, what we still lack, is a profound reflection about the data that we produce and their durability and accessibility in time. Fifty years from now, will we still be able to view the photos taken with our digital cameras? Will MP3 remain a common audio file format? From a personal point of view, I have learned that I can easily enjoy photos taken more than
100 years ago. No special viewer devices are necessary. Scratches  do not compromise the message. Even more, printed books last for several centuries. They are accessible to everyone, and they seem to be very robust. Some characters missing? It doesn't matter!
Most of the information is still there. A missing page does rarely compromise the understanding of a book. The "book format" is definitely fail-safe. But what is with digital data? Firstly, we need special devices and special software to access the data. A magnetic tape produced in the 1970's is likely to be unreadable. The hardware has vanished. And even if we find a tape drive, do we have the necessary drivers? Do we know the format of the data on the tape? Reading and interpreting such tapes will be the work for future generations of archeologists ;-) Secondly, digital data is very susceptible to physical and logical damage. This is mainly due to the high storage density and to the lack of redundancy. A deep scratch on a data CD means the lost of all the data.

Regarding the last 40 years, many different technologies and concepts have emerged and only few of them have survived. To cite a few of them: the ASCII table, the RS232 serial interface, FORTRAN, CRTs, etc. Many technologies have vanished: 5.25" and 3.5" floppies, music and data cassettes, LPs, ZIP-drives, a-very-long-list-of-devices-and-software.

With every transition to a new hardware or software standard we are forced to convert and migrate our data. This is tedious and error prone. Of course, it is not desirable to stop the development of new technologies. Nobody would like to stay with 5.25" floppy disks. However, we could make much more efforts to preserve the format, i.e. the structure of our data.

The iData project is aimed to develop a new data structuring framework that helps to preserve our data for the duration of our life or even for future generations. To this respect a new file format is developped that combines the data and the logic to access the data. Tools will be developped to create such data files and to read the data.

The minimal feature set for the new data file format is:
  • The data stored in iData files must be readable and writeable for a very long time. This can only be achieved, if we combine the data and the logic to read/write the data in one file. It is therefore necessary to add encoder/decoder code to the data file (embedded encoder/decoder). The encoder/decoder code must be stored in a universal and extensible interpreter language in the style of Java or Python. This code provides a standardized interface to the data. A single interpreter application can thus read virtually any video or audio format. As an example, the video interface provides an abstract function getFrame(int n) that returns a byte array with the image data. The decoder code implements this function and takes care of the conversion from the internal video format (MPG2, DivX, etc) to the byte array.
  • iData files must be robust. The iData file format will therefore provide redundancy and integrity checking (check sum).
  • iData files should contain easily accessible (e.g. ASCII text) information about the data structure. This is to prevent the lost of knowledge about how to access the data.
Further desired feature are:
  • Data encryption.
  • to be completed...
2005-02-23, manusi

Implementation Ideas

An iData file must be structured in such a way that transmission errors can be detected and recovered (if the error rate is not too high). For this it is useful to organize the data in chunks (frames) of defined and fixed length. Each frame is delimited by a start and an end tag. The delimiting tags should have a structure that is easily parseable. It is also useful to add a frame enumerator (identifier), e.g. base tag + enumerator. The length of one frame and the structure of the tags must be declared very early in the data file header.

Basic logic layout of an iData file:

decoder and/or encoder meta code

Tag: file type identifier, e.g. iData-1.0
Header: contains information such as the frame length, frame tag structure, owner, data type, owner, etc.
Decoder/Encoder: the decoder and/or encoder meta code (interpreter code)
Data: the data

Suggested file name extension: *.ida

Project Status

Collection of first ideas. First tiny bits of code to test for feasibility.
Setup of the web site.


The author can be reached via email at manu ("AT") manusi ("DOT") org

This site has been visited Visitor Counter times since 2005-03-01.
Last update: 2005-03-02.

This document is copyrighted © 2005 by Manuel Sickert. All Rights Reserved.
Valid HTML 4.01!