Toolbox
  • Printable version
 
TOOLBOX
LANGUAGES
Language
Personal tools
Wikipedia Affiliate Button
 

PlanetWikimedia

From BrightByte

Jump to: navigation, search

This page defines a news feed available in RSS and Atom format.
Below is a preview of the current contents of the feed.

RSS feed - PlanetWikimedia
Atom feed - PlanetWikimedia

The PlanetWikimedia feed combines the MediaWiki and Wikimedia feeds for use by http://en.planet.wikimedia.org/

Versioning Structured Data

Daniel at BrightByte, 15:11, 3 August 2010

Free Content

There has long been talk about a "data wiki", that is, a way to collect and maintain structured, factual data in a collaborative, wiki-like fassion. The most obvioius application for this would be to manage the information we now see in Wikipedia's infoboxes on the right side of many articles. The basic requirements for such a system are:

  1. centralized. Data used on several web pages (wikis) is maiontained on one place. There may, however, be multiple data wikis for different kinds of data.
  2. multilingual. If values are language-specific, it should be possible to enter a value for each language, and there should be a mechanism for selecting a language (or a preference list of languages) when querying results.
  3. versioned. The system must provide a mechanism to store all old revisions of a record, make them available upüon request, and present differences between arbitrary revisions of records.
  4. scalable. The system should be able to handle dozents or hundreds of millions of records, with up to a hundred properties each, and with hundres of revisions for each record.
  5. flexible. It should be easy to introduce new types of records and modify the scecification of existing records, without disturbing the system.

Requirements 1, 4 and 5 are met more or less by existing document based database systems like MongopDB, CouchDB or even Lucene. Multi-lingual values can be added without much trouble if the DB supports complex data values. Versioning however is a bit more tricky, none of the existing systems seem to support it.

With a bit of though, however, versioning can be implemented on top of a regular document-based system (thank you, Dirk). In order to achive this, we introduce meta-properties that are not part of the actual record's data, but used for management. As a convention, we start the names of these properties wuth an underswcore "_". We would need at least the following: [...Versioning Structured Data...]

(Talk:Versioning Structured Data)

Neo4j

Daniel at BrightByte, 20:15, 28 July 2010

Free Content

neo4j is a graph database written in Java (neo4j.org). I recently poked at it a little to see if it could be used to make fast queries over Wikipedia's category structure.

The Problem

Using the category structure when searching content on Wikipedia, or when looking for maintenance task in a specific topic area, has long been a pending item on the wishlist of a lot of people. Some years back, I wrote catscan to address the issue, but it's slow, truncates results, prone to failure, and generally ugly. So I'm looking for better ways to do this, and neo4j looks like an option.

But first off, a closer look at the problem: Categories on Wikipedia are not tags: they can't easily be combined (intersected), but they can put into relation to each other (making subcategories). A category can be a subcategoriy of several other categories: American Writers may be a subcategory of American people and Writers. By convention, there should be a single root category, and there should be no circles in the category structure, so the resulting graph is a directed graph that has no circles and is (weakly) connected. This is alsy called a poly-hierarchy. However, there is nothing that actually prevents circles, and nothing that forces the structure to be connected. So, both loops and islands may occur.

The most wanted feature now is commonsly called deep category intersection: we want all pages that are contained in two categories, while also considering all of their subcategories. Formally, this is the intersection of the transitive closure of the two categories alon the subcategory-relation. Calculating the transitive closure is typically done by recursively evaluating all subcategories. However, this is something traditional relational database systems are particularly bad at - it's only possible with lots of individual queries, which makes the proces quite slow.

== The Idea == [...Neo4j...]

(Talk:Neo4j)

Hack the Wiki (26c3)

Daniel at BrightByte, 11:51, 4 December 2009

I'm trying to organize a hack the wiki corner at the upcoming 26th Chaos Communication Congress (Berlin, Dec 27-30). The idea is to have a place where people interested in contributing to the Wikipedia/MediaWiki user experience on the technical level can discuss their ideas and get help implementing them - be it by writing extensions, toolserver tools, javasacript gadgets, api-based bots, or whatever.

I'm looking for people interested in being available for questions and discussions about any aspect of interacting with Wikipedia and/or MediaWiki in code. There has been some criticism from the German hacker community of the "wiki 2000" experience Wikipedia supposedly offers. There's quite a bit of energy directed at hacking up alternative ways to access and use Wikipedia content there. I'd like to channel some of that drive into improvements usable directly in medaiwiki or on the wikimedia sites.

Note that this my personal pet project, nothing official by WMDE. At least, not yet. If there are enough people interested, I hope we can get support for things like travel cost from WMDE. So please mail me (daniel dot kinzler at wikimedia dot de) if you are interested, so I can get things organized.

Oh, and: that congress is great fun, with tons of cool blinking stuff and

thousands of geeks - well worth going to in any case!

(Talk:Hack the Wiki (26c3))

(no comments yet)