Distributed Wikipedia Project

From Wikiteer

The Distributed Wikipedia Project goal is to solve the current crises of ever growing bandwidth and server cost of the Wikimedia Foundation for hosting the ever growing Wikipedia and related projects. In the current model the WMF runs a centralized database on a serverfarm in Florida and some squid/cache servers in Paris, in the Netherlands (Kennisnet) and elsewhere. This model seems not to scale very well.

A (totally) distributed approach to Wikipedia and related projects might solve that problem (and create a few others). This projectpage intents to organize efforts to create such an approach.


The idea: reclaim the World Wide Web for starters

Currently contributers of encyclopedic content to Wikipedia submit their articles to a centralized server owned and run by the Wikimedia Foundation. They could just as well post the article in html-format to their homepage with their own ISP. A previous version of this page for example is on the homepage Dedalus, linking back to his project page. I belief that that could be done with any Wikipedia article and having the blue underlined so called wikis be transformed in real url's to other articles on homepages of other contributors. So, how do you edit such a page? What about versioning? That amounts to submitting a new version, right as it is now, but not to a centralized server, but to (another) homepage. That would require a clever mechanism to secure that a link to the original article now will point to the updated article on another site. Anyone any idea how to solve that problem?

A possible alternative: TiddlyWiki

I don't know if it will eventually work, but TiddlyWiki might be helpful. I'v set up a projectpage in Tiddly her: http://home.planet.nl/~huike017/DistributedWikipediaProject.html

And I found a place for the Tiddly that can be edited online (at least, by me): http://distributedwikipediaproject.tiddlyspot.com/

My idea detailed - Anthony

Basically my thoughts were to have a sort of mirror-in-a-box. A Wikipedia mirror which could be set up (on at least Windows and Linux for starters) with a few clicks. There are lots of problems with that simplified approach, though, which I think can all be fixed.

  • Hard drive space - I think a reasonable minimum for a participant in the distributed system is to dedicate 1 gig of drive space. That's somewhat high, but it's a good enough number for starters. But 1 gig of drive space won't hold the text of the English Wikipedia, let alone all of the projects with images. So we'd have to allow partial mirrors, where each participant takes a particular chunk. This isn't too much of a problem, but there are a lot of different solutions. I currently favor using some sort of hashing mechanism. Thus each mirror only has to distribute the range of hash keys which it is handling.
  • Authentication - with a standard Wikipedia mirror, the host is free to change anything. This is in some ways a good thing, but there should at least be a way to designate some sort of "official" version of a file. The way I see doing this is with digital signatures. At the least Wikipedia would have a public/private keypair and could sign articles for its users. In a more advanced version each user could have a public/private keypair and Wikipedia's signature would only be needed upon account creation. The issue with this is I'm not sure it can be easily done in standard java/javascript (without resorting to signed applets which can basically do anything). Maybe I'm wrong there, but if not there would need to be a special client to handle this. That's OK though, people could still test out the system without downloading the client, they just wouldn't be sure of the authentication. As for the client, a simple proxy-type program would work which checked the signatures.
  • Bandwidth - Current mirrors rely on the main Wikipedia to get their content. This doesn't save much if any money for Wikipedia. The solution to this is pretty obvious. The mirrors will get their information from other mirrors. With the support of the Wikimedia Foundation the main Wikipedia site could be deprecated to where it only handles signatures. Even without their support certain client/servers could be set up to use the IRC channel to monitor changes and then download them as they come.
  • Anonymity - We don't really want mirrors to be able to track users. Fortunately this one has already been solved. Anonymity-conscious users can access Wikipedia through the Tor network. Tor hidden services can be set up to make this less draining on the Tor system (as hidden services don't require open exit nodes). Servers can run the hidden service, the open wiki, or both.

Initial implementation: I'm thinking a vmware player virtual machine running linux (as the guest). The advantage is speed of design. Almost all the software to do this is already out there, the problem becomes one of configuring a system. The significant disadvantage is that it relies on proprietary freeware, but this would only be for a reference implementation. The protocol would work with a reimplemented client/server which doesn't rely on vmware, and work could begin on that once the virtual machine was working. Eventually no one would have to use the initial implementation - just look at all the different implementations of bittorrent for instance.

Qemu has now progressed to the point where it is a viable product for the initial implementation. It's currently slow, but should be usable for webhosting. And unlike VMWare, there's no installation process. You could stick it on a usb drive and run it from anywhere.

This is a rather quick dump of what I've been thinking about on and off over the course of years. Please, ask me questions and lets get a more detailed design. Or maybe you just wanted to take things in a totally different direction? I haven't yet seen TiddlyWiki. Maybe they've already solved the problems and they're the way to go...


(feel free to add your name, and a link to your homepage)


Personal tools