RecentChanges via Jabber
Good news from FOSDEM! There was a bunch of Wikimedians there, and I had the opportunity to bug Brion and Mark about how to make things better. One of the things we talked about was making Recent Changes available via XMPP, aka Jabber. We now have a plan, and are looking for someone to go and do it :)
For a long time now, it has bugged me (and quite a few other people ) that there is no decent real time feed of the Recent Changes log. That would make life so much easier for bots that watch for vandalism etc. The only thing we have right now is the IRC channel, which has messages that are designed for people to read, not for bots to process. The message format is ambiguous, and often messages get cut off, making it impossible to get the information required. It's also hard to add new information to these messages, such as flagging or patrolling status, etc.
The RSS feed and the API are not good alternatives, since polling via HTTP is slow and causes quite a bit of server load. A "push"-based approach would be much nicer.
Using XMPP to provide a life stream of Recent Changes events seems the obvious choice: it was designed for precisely for the purpose of notifying clients of events, and it can transport any kind of structured content, since it uses an extensible XML format for representing messages. In the past, I and others have thought about implementing this, but it seems that plans always ended at the point where we would need a jabber server to be running on the Wikimedia server farm to actually make this useful. But there's little a nice chat over a couple of excellent Belgian beers can't solve! Brion really like the idea, and Mark has agreed to set up a XMPP server for this purpose. Yay!
So, what's still missing? A way to get the information out of MediaWiki into the Jabber server. MediaWiki sends out notifications about changes as UDP packets. Currently, these UDP packets contain text pre-formatted to be used on the IRC channels. They are received by a small special program that looks at the packet to find out to which wiki (and thus to which IRC channel) it belongs, and forwards it there.
We would now need MediaWiki to send out another UDP packet, which contains the information in a machine readable format -- I would suggest XML (because XMPP likes that) exactly in the form the API outputs it when asked for recent Changes (to keep things nice and consistent). Currently, the formatting of the message and sending it out via UDP are hard-coded. This is really a nasty hack. It would be much nicer to have both aspects handled by separate, pluggable functions: one for formatting the message, and one for sending it out, by whatever means.
The simplest approach would be to introduce hook points for each functionality -- but then, you can only have one output format, which can be send out via several channels. It would be nicer if formats and methods of delivery could be combined freely. I'm thinking of an interface that implements "here's an RecentChange object, send it out", which in turn can use some arbitrary implementation of "here's a recentChange object, turn it into a string". But maybe I'm over-thinking this.
In any case, there should be a UDP packet send out that contains the XML. This packet has to be received by a small XMPP client which is connected to our XMPP server (or even running as part of the server process, if such a thing is easy to do). it would basically do the same thing our small IRC forwarding script does now: look at the message, see to which wiki it belongs, and forward it to the appropriate channel.
For now, it's probably easiest to run the IRC feed in parallel to the XMPP feed - just send out two UDP packets and process them separately. In time though, it would be nicer to build the IRC stream from the XML-formatted UDP packets too - or even from the XMPP channel. I suppose Jabber-to-IRC bridges exist.
Also, currently it's probably sufficient to use the simplest form of XMPP messages, which is the Instant Messaging profile known as Jabber. However, if we want to get all fancy, we should look into extending the service to support the PubSub extension of XMPP, which is especially aimed at machine readable content and provides a lot of options for things like server-side filtering of messages. This would be great for bots that are not interested in all changes, but only in specific types of changes, such as new page creations, pages moves, or whatever.
So... who's interested in tackling the "send RC messages as XML" bit? The code to build XML from the Recent Changes log is already there, it's used by the API. it might have to be adjusted to work with RecentChange objects, though, I haven't looked. In any case, the fun bit is designing a decent plug-in architecture, that allows for combining formats and transports in a flexible way. Just adding more hard-coded logic for sending out XML wouldn't be so nice.
This sounds like a fun job, and I'm actually tempted to do it myself. I'm currently committed to too many projects already though, so this would have to wait. Again. For who knows how long. So, if you have fun digging around MediaWiki code, go ahead and look at this! Dozens of bot authors are going to be extremly grateful :)
To quote the last thing Brion said to me in Brussels:
Let's make that Jabber shit happen!
Edit: This post is now referenced in a feature request on bugzilla 
Update (August 2009): After some talk at Wikimania, I started implementing the WediaWiki-Extension that would emit the XML events over UDP (XMLRC). The bit that would feed that into an XMPP channel is still missing.
Update (November 2009): At the usability meeting, Brion promised to help set up an XMPP server for testing. Yay!