How To: Install Wikipedia For Offline Access

on Jumat, 07 Desember 2012

August 31, 2006
In the old days (say, around 1990) a must-have application when buying a computer was an encyclopedia on a CD-ROM. Hello, Grolier’s and Encarta! No more would you need a shelf full of books to look up interesting facts! When I bought an iBook, it came with a copy of World Book, which I thought quite an entertaining addition.
These days, such an addition is no longer the norm, thanks to the Internet. An incredible amount of information can be gleaned online with a quick search. However, a project started a few years ago has quickly risen to become a great resource for user-provided information on a wide variety of topics. I speak, of course, of Wikipedia. While initially just a quick repository for user feedback, it’s quickly become a resource worthy of comparison to more established sources, such as The Encyclopedia Britannica, even if its veracity may be in question.
I have a laptop, but don’t always have an Internet connection, but wondered, why can’t I have an offline copy of Wikipedia? As it turns out, I can. Now, if I’m on the road and want to look up something quickly, I don’t even have to find a hotspot — I can just turn on my laptop, pull up a browser, and find the answer. This article shows you how I did it.
Overview

Wikipedia runs on the open source software MediaWiki. This in turn runs on top of MySQL and PHP, as well as possibly Linux and Apache. My laptop runs Windows XP Professional SP2 Tablet PC Edition, so running Linux and Apache just wasn’t going to happen. Fortunately, there is a WAMP project (Windows – Apache – MySQL – PHP), which did all the hard work of that installation for me. So, all I’d have to do is:
  • Install WAMP.
  • Install MediaWiki.
  • Download and install a pages dump of Wikipedia.
These instructions should in theory work for any Windows XP SP2 machine. However, your results may vary. I take no responsibility if you try this yourself! Some anticipated caveats:
  • You need Administrator privileges. You’re installing software, as well as creating services, so you need the privileges.
  • You need disk space. The full English Wikipedia will take a over 10 gigabytes when uncompressed into the database.
  • You need NTFS. Because of this, the database files themselves may grow to larger than 2 GB. If you’re using FAT32, you’re out of luck.
  • You’re installing a new service. By default, the server installs without remote access, and hopefully, you leave your firewall in place. However, you are still installing new services on your machine, which means they have the potential for exploitation.
  • No pictures included. These instructions do not cover the images in Wikipedia.
That said, let’s get on with the show!
Install WAMP.

wamp_setup.jpg
Go to the Wampserver site and download the latest WAMP distribution (in my case, 1.6.4). Double-click the executable to run, and the defaults will pretty much be what you want. (E.g., install to C:wamp, create a Start Menu group, do not auto-start, set DocumentRoot to www, and Launch immediately.)
A Windows Security Alert will probably pop up and ask if you want to keep blocking Apache HTTP Server. You want to select “Keep Blocking” for this question.
wamp_security.jpg
Now, in your systray on the lower right side you should see a little dashboard icon, with a lock on it. It should be white, and when you mouse over it, it should say “WAMP5 – All services running – server Offline”. (When they say “offline” here, they actually mean that it’s restricting access to localhost — it’s actually online, technically.
To verify that it’s working, open up a web browser, and point it at http://127.0.0.1/. If the installation was successful, you should see a page that looks like the following:
wamp_success.jpg
That’s it for WAMP!
Install MediaWiki.

First, we’ll set up a MySQL user for Wiki. To do so, make sure WAMP is running. (If not, go to Start->Programs->WampServer->Start Wampserver.) Then, go to phpMyAdmin. Click on “Privileges”, then “Add a new User”. Use the following values:
  • User name: wikiuser
  • Host: Select “Local” from the dropdown
  • Password: Select “Use text field” from the dropdown, and pick a password of your choice
  • Generate Password: Click the “Generate” button
  • Global privileges: Leave all unchecked
Scroll to the bottom and click “Go”, and it should successfully create a user. On this confirmation page, you should have a screen to edit the user if you scroll down. Do so, to the section marked “Database-specific privileges”. Set the dropdown to “Use text field”, and enter “wikidb”. Click “Go”.
mysql_db_privileges.jpg
You should be presented with a new page for Database-specific privileges. Click the “Check All” link to check all the boxes, and click “Go”.
mysql_db_privileges_edit.jpg
Download the latest stable release of MediaWiki. At the time of this writing, that was version 1.7.1. It’s a .tar.gz file, so you’ll need a program to expand it — I recommend the shareware program WinRAR. When you unpack this, you’ll create a folder named mediawiki-1.7.1. Rename this to wikipedia, and move it to c:wampwww.
If you now visit http://127.0.0.1/wikipedia/, you should get a splash page saying to “setup the wiki” first. Follow that link, and you should get a “Site config” page. I used these values for this form:
  • Wiki name: Wikipedia Offline
  • (Admin) Password: custom password
  • Database host: localhost
  • Database name: wikidb
  • DB username: wikiuser
  • DB password: same password used when creating MySQL user
The other defaults were fine. Once done, I went in Windows Explorer to C:wampwwwwikipediaconfig, and moved the file LocalSettings.php up one directory to C:wampwwwwikipedia.
Another check of http://127.0.0.1/wikipedia/ should state that “MediaWiki has been successfully installed.”
Download and install a pages dump of Wikipedia.

You can download a copy of the English Wikipedia pages from http://download.wikimedia.org/enwiki/latest/. However, you should check this page, for the entry for “enwiki” first, to make sure the dump completed successfully. The file you will want is named enwiki-latest-pages-articles.xml.bz2. This contains all the article pages, but none of the revisions or history. You just want the articles, right? As of this writing, that file is around 1.5 GB, compressed.
If you don’t already, you should make sure you have Java installed. If you don’t, you can get it from http://java.sun.com/j2se/1.5.0/download.jsp. I usually just open a command window and type java and hit enter, and see if it just hangs. If it does, it’s probably installed, and I hit cntrl-C to cancel.
You’ll also need MWDumper. Download mwdumper.jar from http://download.wikimedia.org/tools/. Put this file and the wiki dump file in the same directory, say, c:tmp.
You’ll need to edit MySQL’s config file to increase the max_allowed_packet size. If you don’t, the import will most likely choke around the 49,000 article mark. This is quite annoying, because it kills the rest of the import. While you’re add it, you might as well change the innodb_log_file_size, which should modestly increase the import speed. To do so, go to c:wampmysql, right-click on my.ini, and select Open. This will open up the ini file in a text editor. Find the line innodb_log_file_size, and set this to 512M (was 10M in mine). Scroll to the bottom, and add the following line:
set-variable=max_allowed_packet=32M
Remember that little dashboard with the lock in your systray? Left-click on it, and a menu should pop up. Select MySQL->Stop Service. Wait a few seconds, then left-click on it again, and select MySQL->Start / Resume Service.
Before you import, you’re going to need to delete data in MySQL from the default installation. Otherwise, you’ll get errors about a dupe right at the start, and then none of the rows will import. Left-click on the dashboard with a lock in your systray, go to “MySQL”, and select “MySQL console”. You’ll be asked for a password, which by default is blank, so just hit enter. Enter in the following commands into the console:
use wikidb;
delete from page;
delete from revision;
delete from text;
quit
mysql_delete_data.jpg
This will delete all pages in the wiki.
Open a command window by going to Start->Run, and typing in cmd. Type c: to change to the c: drive, and then cd c:tmp to change to the directory where you put mwdumper.jar and the wiki dump file. You’re ready to do the import, but beware — this will take a long time. It’s best to start the process, then leave for a few hours. When you’re ready, type the following:
java -jar mwdumper.jar --format=sql:1.5 enwiki-latest-pages-articles.xml.bz2 | c:wampmysqlbinmysql -u wikiuser -p wikidb
import_starting.jpg
This will begin the import process, and as noted, this will take a long time. There are over three million pages to process, so don’t expect it to finish right away. On a reasonably fast single processor machine (*not* my laptop), it took me over 24 hours.
Usage

Using Wikipedia Offline is pretty straightforward. If you haven’t already, start WAMP. (If you see the dashboard with a lock icon in your systray, and it’s white, then it’s running. If not, go to Start->Programs->WampServer->Start Wampserver.) Then, just fire up a web browser and browse to http://127.0.0.1/wikipedia/. If all goes well, it should be accessible just like Wikipedia, searches and all.
wikipedia_up.jpg
Anticipated Questions

  • Why do this?
    I’m not always connected to the Internet, and think Wikipedia is a great resource. Now I can take it wherever I want. I suppose if I were paranoid about Wikipedia tracking my searches, then I could do this and do all the searches I wanted offline. Doing it this way also seemed like a fun tech project.
  • Is this legal?
    Sure! Wikipedia offers all of their data for use by interested parties. All of the software involved is open source, except for Windows.
  • Where are the pictures?
    You can download a dump of the English Wikipedia images from here. Wikipedia doesn’t package these with the dump for two reasons: 1) the images might be copyrighted, so they don’t want to distribute them; and 2) the dump file would be huge. As of this writing, the dump file is about 75 GB, which was larger than the hard drive on my laptop.
  • Isn’t it overkill to install full MediaWiki?
    Yes, but it’s not nearly as much effort as you might think. Plus, with WAMP, I can experiment with other types of LAMP-based software. You can always build static pages if you’d prefer something a little more lightweight.
  • Won’t the data fall out of date?
    Yes, but I’m doing this more to just have a quick reference, rather than something that’s kept constantly up to date. In that sense, it’s similar to those encyclopedia CD-ROMs! Besides, the way the dumps are handled, you’re guaranteed to be slightly out of date. If you really need to be that current, you should probably be going online.
  • How can updates be done?
    I’m presuming I can just go in mysql and delete from the ‘page’, ‘revision’, and ‘text’ tables; download a new dump; and re-import using mwdumper.jar. I haven’t actually tried this, though.
  • Can I use these instructions to run a wiki web site?
    There are a few problems with this. First, the WAMP folk note that “WAMP5 is not meant to be a production server.” Also, running a web site takes a fair bit of security knowledge to prevent hacking, so you’ll get yourself in trouble if you just use it to publish on the ‘net. Finally, you can’t republish the contents of Wikipedia as your own site. So, technically, you could use the first few parts to set up wiki, but it’s not a good idea. You’re better off getting a proper web host that has a one-click install of wiki, such as DreamHost.
  • How do I uninstall?
    If you want to trash the whole thing, go to “Add or Remove Programs” in the Control Panel, and select WAMP5. Remove it, then be sure to delete C:wamp as well.

sumber http://blog.onetechnical.com/2006/08/31/how-to-install-wikipedia-for-offline-access/
Ranking: 5
Comments
0 Comments
 
© Geazzy Corner All Rights Reserved