Toolbox
  • Printable version
 
TOOLBOX
LANGUAGES
Language
Personal tools
Wikipedia Affiliate Button
 

News

From BrightByte

Jump to: navigation, search

This page defines a news feed available in RSS and Atom format.
Below is a preview of the current contents of the feed.

RSS feed - News
Atom feed - News

The news feed includes all major updates to brightbyte.de.

Versioning Structured Data

Daniel at BrightByte, 15:11, 3 August 2010

Free Content

There has long been talk about a "data wiki", that is, a way to collect and maintain structured, factual data in a collaborative, wiki-like fassion. The most obvioius application for this would be to manage the information we now see in Wikipedia's infoboxes on the right side of many articles. The basic requirements for such a system are:

  1. centralized. Data used on several web pages (wikis) is maiontained on one place. There may, however, be multiple data wikis for different kinds of data.
  2. multilingual. If values are language-specific, it should be possible to enter a value for each language, and there should be a mechanism for selecting a language (or a preference list of languages) when querying results.
  3. versioned. The system must provide a mechanism to store all old revisions of a record, make them available upüon request, and present differences between arbitrary revisions of records.
  4. scalable. The system should be able to handle dozents or hundreds of millions of records, with up to a hundred properties each, and with hundres of revisions for each record.
  5. flexible. It should be easy to introduce new types of records and modify the scecification of existing records, without disturbing the system.

Requirements 1, 4 and 5 are met more or less by existing document based database systems like MongopDB, CouchDB or even Lucene. Multi-lingual values can be added without much trouble if the DB supports complex data values. Versioning however is a bit more tricky, none of the existing systems seem to support it.

With a bit of though, however, versioning can be implemented on top of a regular document-based system (thank you, Dirk). In order to achive this, we introduce meta-properties that are not part of the actual record's data, but used for management. As a convention, we start the names of these properties wuth an underswcore "_". We would need at least the following: [...Versioning Structured Data...]

(Talk:Versioning Structured Data)

Neo4j

Daniel at BrightByte, 20:15, 28 July 2010

Free Content

neo4j is a graph database written in Java (neo4j.org). I recently poked at it a little to see if it could be used to make fast queries over Wikipedia's category structure.

The Problem

Using the category structure when searching content on Wikipedia, or when looking for maintenance task in a specific topic area, has long been a pending item on the wishlist of a lot of people. Some years back, I wrote catscan to address the issue, but it's slow, truncates results, prone to failure, and generally ugly. So I'm looking for better ways to do this, and neo4j looks like an option.

But first off, a closer look at the problem: Categories on Wikipedia are not tags: they can't easily be combined (intersected), but they can put into relation to each other (making subcategories). A category can be a subcategoriy of several other categories: American Writers may be a subcategory of American people and Writers. By convention, there should be a single root category, and there should be no circles in the category structure, so the resulting graph is a directed graph that has no circles and is (weakly) connected. This is alsy called a poly-hierarchy. However, there is nothing that actually prevents circles, and nothing that forces the structure to be connected. So, both loops and islands may occur.

The most wanted feature now is commonsly called deep category intersection: we want all pages that are contained in two categories, while also considering all of their subcategories. Formally, this is the intersection of the transitive closure of the two categories alon the subcategory-relation. Calculating the transitive closure is typically done by recursively evaluating all subcategories. However, this is something traditional relational database systems are particularly bad at - it's only possible with lots of individual queries, which makes the proces quite slow.

== The Idea == [...Neo4j...]

(Talk:Neo4j)

Hack the Wiki (26c3)

Daniel at BrightByte, 11:51, 4 December 2009

I'm trying to organize a hack the wiki corner at the upcoming 26th Chaos Communication Congress (Berlin, Dec 27-30). The idea is to have a place where people interested in contributing to the Wikipedia/MediaWiki user experience on the technical level can discuss their ideas and get help implementing them - be it by writing extensions, toolserver tools, javasacript gadgets, api-based bots, or whatever.

I'm looking for people interested in being available for questions and discussions about any aspect of interacting with Wikipedia and/or MediaWiki in code. There has been some criticism from the German hacker community of the "wiki 2000" experience Wikipedia supposedly offers. There's quite a bit of energy directed at hacking up alternative ways to access and use Wikipedia content there. I'd like to channel some of that drive into improvements usable directly in medaiwiki or on the wikimedia sites.

Note that this my personal pet project, nothing official by WMDE. At least, not yet. If there are enough people interested, I hope we can get support for things like travel cost from WMDE. So please mail me (daniel dot kinzler at wikimedia dot de) if you are interested, so I can get things organized.

Oh, and: that congress is great fun, with tons of cool blinking stuff and

thousands of geeks - well worth going to in any case!

(Talk:Hack the Wiki (26c3))

Dnsmasq

Daniel at BrightByte, 12:05, 6 September 2009

For some application, it is useful to have a DNS server running locally on your maching. I use dnsmasq for this. However, ubuntu configures dnsmasq for use on a gateway per default, which is not what I want it for. I want the following things changed:
  • dnsmasq should use as upstream DNS server whatever resolvoncf resp. dhclient determine to be my DNS server.
  • all my local programs should however ask only dnsmasq to resolve names.
  • dnsmasq shall work locally only, not act as a DNS server for others in the network
  • dnsmasq shall not act as a DHCP server.

So, here's the setup for /etc/dnsmasq.conf:

# Never forward plain names (without a dot or domain part)
domain-needed
# Never forward addresses in the non-routed address spaces.
bogus-priv
 
# use the resolv.conf generated by resolvoncf for upstream resolution.
# /etc/resolv.conf will initially be a symlink to that file. we will change that later, see below.
resolv-file=/etc/resolvconf/run/resolv.conf

# try upstream servers strictly in order
# useful if you want to override the upsteam DNS server you get from DHCP in the resolvconf config.
strict-order

# only work locally
interface=loopback
listen-address=127.0.0.1

# NOTE: the bind-interfaces is rejected by dnsmasq on my machione, even though it's present in the example config. odd.
# but the restrictions above should be sufficient anyway.
# bind-interfaces=loopback

# no DHCP (since we only listen to loopback, we only need to exclude loopback)
no-dhcp-interface=loopback

If you have a stupid ISP that uses wildcard A records to grab requests for unknown domains, you can filter them out like this:

# filter bogus A records
bogus-nxdomain=62.157.140.133
bogus-nxdomain=80.156.86.78

If you want to serve SRV records for special services (in this case, Jabber multi user chat):

# The fields are <name>,<target>,<port>,<priority>,<weight>
srv-host=_conference._tcp.dell-daniel,dell-daniel,5267

Then restart dnsmasq:

> sudo /etc/init.d/dnsmasq restart [...Dnsmasq...]

(Talk:Dnsmasq)

(no comments yet)