Toolbox
  • Printable version
 
TOOLBOX
LANGUAGES
Language
Categories
Wikipedia Affiliate Button
 

MediaWiki media dreams

From BrightByte

Jump to: navigation, search
Free Content

Despite the name, MediaWiki's handling of media files is sadly rudimentary. It is pretty much limited to basic image formats - other files can be uploaded, but the software doesn't really know what do do with them. Support for SVN and DjVu has been added as hard-coded hacks to the core system, making the media handling code even more convoluted than it already was.

I have been thinking about a flexible framework for media-handling plugins for quite some time now. Here's a rough pipe-dream of how things could be done; it can be split into three parts: a generic upload validation hook, media upload handlers, and media presentation handlers. A fourth part, media storage abstraction, is already being worked on by Tim Starling, and is largely independent of my thoughts.

Upload Validation Hook

This is for validation that is to be performed on all uploaded files. This could be implemented using a traditional "hook point", i.e. calling wfRunHooks(), and in fact, such a hook exists, namely UploadVerification. Tasks that could be performed at this point are:

  • virus scanning. There is some code for using programs like F-Prot or ClamAV in the core now, which could be moved to an extension.
  • protection against code injection. Explicit checks for shell scripts, JavaScript files, etc - this is especially important since Internet Explorer tends to interpret anything that remotely looks like HTML, and will also run JavaScript code contain in such files - completely ignoring the MIME type sent by the server, and also ignoring the file extension. This type of check is currently performed by UploadForm::detectScript, but should be moved to a separate plugin (but still be enabled per default).

Upload Handlers

Media upload handlers are objects implementing the MediaUploadHandler interface, and are registered for one or more file extensions. When a file is uploaded, the upload handler registered for the file's extension will be used to handle that file. The upload handler is supposed to perform the following tasks:

  • check the validity of the uploaded file. This is supposed to replace the messy attempt to somehow guess the uploaded file's MIME type and then compare that to the file's extension (currently handled by the MimeMagic class i wrote). Upload handlers can apply checks specific to a given file format, including looking into zip archives (for zip-based formats like OpenOffice.org documents), looking into XML files (for example to detect references to external files in SVG images) and so on.
  • extract meta-information to store in the database. This should include any information that might be useful for displaying the file to the user, or for showing to the user directly. There are basically two kinds of meta-info:
    • generic info, same for all media formats. This info would go into separate database fields, and could thus also be used for lookups and searches. Fields for this kind of info mostly exists already in the current schema: file size (in bytes), mime type major + minor), media type (image, audio, video, etc). Currently, width and height are also treated as generic, but probably shouldn't be.
    • format-specific info, different for different kind of files. This would be stored as a single blob in the database (probably a serialized PHP array or object). This is already done for EXIF data from JPEG files, but could be extended to include a lot more:
      • width and height for images and video
      • length in seconds for audio and video
      • resolution (bitmaps), sample rate (audio) or frame rate (video)
      • "data flavor" for container formats (for OGG, this could be "vorbis" (audio), "theora" (video), etc).
      • Information from ID3 tags (analogous to EXIF data)
      • and anything else, really.

Presentation Handlers

Media presentation handlers are objects implementing the MediaPresentationHandler interface, and are registered for one or more MIME types (possibly taking into account the media type, and perhaps even the "data flavor" or other information from the info-blob). They define how a file of a given type is presented to use user - using plain HTML, server-side rendering, browser-plugins, Flash- or Java-based players, JavaScript for slideshows, etc. Specifically, it would handle:

  • The (increasingly misnamed) [[Image:xxx]] links - any parameters given in the link (like [[Image:xxx|right|200x300px|some caption]]) would be passed directly to the presentation handler, and would be interpretet by it. This way, different handlers could use different parameters (for example, the "page" to show for multi-page documents, etc).
  • Presentation on the file's description page, plus additional meta-info to show on that page (the is already done for EXIF data)
  • Presentation in <gallery> tags and categories.
  • Determining if [[Media:xxx]] is allowed to link to the file directly, or must link to the description page (for potentially "dangerous" formats).


This entry will be updated with more detailed thoughts when I feel like it...

[talk page]Talk:MediaWiki media dreams

[edit] CroEPidXpR

Boiling water for one minute kills the microorgan- isms that cause disease. ,



The above comments may have been left by visitors.

This site's operators can not take responsibility for the content of such comments.