Metrized Small World Active Data Storage
The Metrized Small World Active Data Storage project is dedicated to building a global distributed ("cloud" like) Internet-based data storage and retrieval architecture where petabytes of data can be generated and retrieved by a large number of uncoordinated actors located all over the planet. Ultimately, the construction of such an architecture will result in the possibility for every physical object to have a globally accessible representation in the digital world. This is beneficial for companies who can store and share their item-level data about products and assets in the global “cloud storage”, possibly using EPC codes and RFID tags to link physical objects to their digital representations.

The core of the proposed architecture is the Metrized Small World data structure where the data units (e.g. real world object descriptions) are dynamically consolidated into an overlay network in the form of a so called small world graph. This network is used to navigate through data units during search and new unit addition processes. Small world structures maintain a small average amount of links for every node, at the same time ensuring that every node is accessible from any other node in a small number of link steps. This means that no matter which data unit will be chosen as a start point of the search process, the result can be obtained in a small amount of steps compared to the size of the structure. To determine the right direction for search process and to create the structure where similar data units are properly clustered, a metric (proximity measure) between data units and queries is added to the small world graph resulting in a Metrized Small World data structure. For further information about the MSW structure see our publications section.
Further, to make the structure globally scalable, not only the data but also the data processing should be distributed. We address this requirement by making the data units active, capable of communication with other data units and system clients by means of network messages. Thereby the computational and communicational effort required for data unit search and addition can be offloaded from request originators to the distributed storage itself.
To explore and verify our concept, we have created a prototype consisting of a dynamically extendable set of HTTP servers hosting the Active Data Units (ADUs) consisting of the XML content module and the link list module containing the XLink-formatted links to other ADUs. The server software components allow ADUs to communicate by means of XML messages transmitted over HTTP.
The main research and development directions of our project are:
- Improvement and enhancement of data search and addition algorithms. We are working on MSW graph structure optimization, extending data types support, faster and more reliable search algorithms, and more.
- Further development of the MSW Active Data Storage prototype. Our main goals here are performance improvements and implementation of new and improved core algorithms.
- Real-time visualization tools for the MSW structure graph. These tools will provide a convenient way to see how the MSW graph evolves with addition of new data units.
- Finding application possibilities for the MSW Active Data Storage technology. While the ultimate goal – creating a globally accessible digital representation for every real world object – may seem distant, there are existing applications that can benefit from the global data availability and scalability offered by the MSW Active Data Storage technology. One already mentioned example is EPC (Electronic Product Code) product data storage and sharing.
