| Free Content |
There has long been talk about a "data wiki", that is, a way to collect and maintain structured, factual data in a collaborative, wiki-like fassion. The most obvioius application for this would be to manage the information we now see in Wikipedia's infoboxes on the right side of many articles. The basic requirements for such a system are:
- centralized. Data used on several web pages (wikis) is maiontained on one place. There may, however, be multiple data wikis for different kinds of data.
- multilingual. If values are language-specific, it should be possible to enter a value for each language, and there should be a mechanism for selecting a language (or a preference list of languages) when querying results.
- versioned. The system must provide a mechanism to store all old revisions of a record, make them available upüon request, and present differences between arbitrary revisions of records.
- scalable. The system should be able to handle dozents or hundreds of millions of records, with up to a hundred properties each, and with hundres of revisions for each record.
- flexible. It should be easy to introduce new types of records and modify the scecification of existing records, without disturbing the system.
Requirements 1, 4 and 5 are met more or less by existing document based database systems like MongopDB, CouchDB or even Lucene. Multi-lingual values can be added without much trouble if the DB supports complex data values. Versioning however is a bit more tricky, none of the existing systems seem to support it.
With a bit of though, however, versioning can be implemented on top of a regular document-based system (thank you, Dirk). In order to achive this, we introduce meta-properties that are not part of the actual record's data, but used for management. As a convention, we start the names of these properties wuth an underswcore "_". We would need at least the following: [...Versioning Structured Data...]







(no comments yet)