August 31, 2006
In the old days (say, around 1990) a must-have application
when buying a computer was an encyclopedia on a CD-ROM. Hello,
Grolier’s and Encarta! No more would you need a shelf full of books to
look up interesting facts! When I bought an iBook, it came with a copy
of World Book, which I thought quite an entertaining addition.
These days, such an addition is no longer the norm, thanks to the
Internet. An incredible amount of information can be gleaned online
with a quick search. However, a project started a few years ago has
quickly risen to become a great resource for user-provided information
on a wide variety of topics. I speak, of course, of
Wikipedia. While initially just a quick repository for user feedback, it’s quickly become a resource
worthy of comparison to more established sources, such as
The Encyclopedia Britannica, even if its
veracity may be in question.
I have a laptop, but don’t always have an Internet connection, but
wondered, why can’t I have an offline copy of Wikipedia? As it turns
out, I can. Now, if I’m on the road and want to look up something
quickly, I don’t even have to find a hotspot — I can just turn on my
laptop, pull up a browser, and find the answer. This article shows you
how I did it.
Overview
Wikipedia runs on the open source software
MediaWiki.
This in turn runs on top of MySQL and PHP, as well as possibly Linux
and Apache. My laptop runs Windows XP Professional SP2 Tablet PC
Edition, so running Linux and Apache just wasn’t going to happen.
Fortunately, there is a
WAMP project (Windows – Apache – MySQL – PHP), which did all the hard work of that installation for me. So, all I’d have to do is:
- Install WAMP.
- Install MediaWiki.
- Download and install a pages dump of Wikipedia.
These instructions should in theory work for any Windows XP SP2 machine. However, your results may vary.
I take no responsibility if you try this yourself! Some anticipated caveats:
- You need Administrator privileges. You’re installing software, as well as creating services, so you need the privileges.
- You need disk space. The full English Wikipedia will take a over 10 gigabytes when uncompressed into the database.
- You need NTFS. Because of this, the database files themselves may grow to larger than 2 GB. If you’re using FAT32, you’re out of luck.
- You’re installing a new service. By default, the
server installs without remote access, and hopefully, you leave your
firewall in place. However, you are still installing new services on
your machine, which means they have the potential for exploitation.
- No pictures included. These instructions do not cover the images in Wikipedia.
That said, let’s get on with the show!
Install WAMP.
Go to the
Wampserver site
and download the latest WAMP distribution (in my case, 1.6.4).
Double-click the executable to run, and the defaults will pretty much be
what you want. (E.g., install to
C:wamp
, create a Start Menu group, do
not auto-start, set DocumentRoot to
www
, and Launch immediately.)
A Windows Security Alert will probably pop up and ask if you want to
keep blocking Apache HTTP Server. You want to select “Keep Blocking”
for this question.
Now, in your systray on the lower right side you should see a little
dashboard icon, with a lock on it. It should be white, and when you
mouse over it, it should say “WAMP5 – All services running – server
Offline”. (When they say “offline” here, they actually mean that it’s
restricting access to localhost — it’s actually
online, technically.
To verify that it’s working, open up a web browser, and point it at
http://127.0.0.1/. If the installation was successful, you should see a page that looks like the following:
That’s it for WAMP!
Install MediaWiki.
First, we’ll set up a MySQL user for Wiki. To do so, make sure WAMP
is running. (If not, go to Start->Programs->WampServer->Start
Wampserver.) Then, go to
phpMyAdmin. Click on “Privileges”, then “Add a new User”. Use the following values:
- User name:
wikiuser
- Host: Select “Local” from the dropdown
- Password: Select “Use text field” from the dropdown, and pick a password of your choice
- Generate Password: Click the “Generate” button
- Global privileges: Leave all unchecked
Scroll to the bottom and click “Go”, and it should successfully
create a user. On this confirmation page, you should have a screen to
edit the user if you scroll down. Do so, to the section marked
“Database-specific privileges”. Set the dropdown to “Use text field”,
and enter “wikidb”. Click “Go”.
You should be presented with a new page for Database-specific
privileges. Click the “Check All” link to check all the boxes, and
click “Go”.
Download the latest stable release of
MediaWiki.
At the time of this writing, that was version 1.7.1. It’s a .tar.gz
file, so you’ll need a program to expand it — I recommend the shareware
program
WinRAR. When you unpack this, you’ll create a folder named
mediawiki-1.7.1
. Rename this to
wikipedia
, and move it to
c:wampwww
.
If you now visit
http://127.0.0.1/wikipedia/,
you should get a splash page saying to “setup the wiki” first. Follow
that link, and you should get a “Site config” page. I used these values
for this form:
- Wiki name:
Wikipedia Offline
- (Admin) Password: custom password
- Database host:
localhost
- Database name:
wikidb
- DB username:
wikiuser
- DB password: same password used when creating MySQL user
The other defaults were fine. Once done, I went in Windows Explorer to
C:wampwwwwikipediaconfig
, and moved the file
LocalSettings.php
up one directory to
C:wampwwwwikipedia
.
Another check of
http://127.0.0.1/wikipedia/ should state that “MediaWiki has been successfully installed.”
Download and install a pages dump of Wikipedia.
You can download a copy of the English Wikipedia pages from
http://download.wikimedia.org/enwiki/latest/.
However, you should check
this page, for the entry for “enwiki” first, to make sure the dump completed successfully. The file you will want is named
enwiki-latest-pages-articles.xml.bz2
.
This contains all the article pages, but none of the revisions or
history. You just want the articles, right? As of this writing, that
file is around 1.5 GB, compressed.
If you don’t already, you should make sure you have Java installed. If you don’t, you can get it from
http://java.sun.com/j2se/1.5.0/download.jsp. I usually just open a command window and type
java
and hit enter, and see if it just hangs. If it does, it’s probably installed, and I hit cntrl-C to cancel.
You’ll also need
MWDumper. Download
mwdumper.jar
from
http://download.wikimedia.org/tools/. Put this file and the wiki dump file in the same directory, say,
c:tmp
.
You’ll need to edit MySQL’s config file to increase the
max_allowed_packet size. If you don’t, the import will most likely
choke around the 49,000 article mark. This is quite annoying, because
it kills the rest of the import. While you’re add it, you might as well
change the innodb_log_file_size, which should modestly increase the
import speed. To do so, go to
c:wampmysql
, right-click on
my.ini
, and select Open. This will open up the ini file in a text editor. Find the line
innodb_log_file_size
, and set this to
512M
(was
10M
in mine). Scroll to the bottom, and add the following line:
set-variable=max_allowed_packet=32M
Remember that little dashboard with the lock in your systray?
Left-click on it, and a menu should pop up. Select MySQL->Stop
Service. Wait a few seconds, then left-click on it again, and select
MySQL->Start / Resume Service.
Before you import, you’re going to need to delete data in MySQL from
the default installation. Otherwise, you’ll get errors about a dupe
right at the start, and then none of the rows will import. Left-click
on the dashboard with a lock in your systray, go to “MySQL”, and select
“MySQL console”. You’ll be asked for a password, which by default is
blank, so just hit enter. Enter in the following commands into the
console:
use wikidb;
delete from page;
delete from revision;
delete from text;
quit
This will delete all pages in the wiki.
Open a command window by going to Start->Run, and typing in
cmd
. Type
c:
to change to the c: drive, and then
cd c:tmp
to change to the directory where you put
mwdumper.jar
and the wiki dump file. You’re ready to do the import, but beware —
this will take a long time. It’s best to start the process, then leave for a few hours. When you’re ready, type the following:
java -jar mwdumper.jar --format=sql:1.5 enwiki-latest-pages-articles.xml.bz2 | c:wampmysqlbinmysql -u wikiuser -p wikidb
This will begin the import process, and as noted, this will take a long time. There are over
three million
pages to process, so don’t expect it to finish right away. On a
reasonably fast single processor machine (*not* my laptop), it took me
over 24 hours.
Usage
Using Wikipedia Offline is pretty straightforward. If you haven’t
already, start WAMP. (If you see the dashboard with a lock icon in your
systray, and it’s white, then it’s running. If not, go to
Start->Programs->WampServer->Start Wampserver.) Then, just
fire up a web browser and browse to
http://127.0.0.1/wikipedia/. If all goes well, it should be accessible just like Wikipedia, searches and all.
Anticipated Questions
- Why do this?
I’m not always connected to the Internet, and think Wikipedia is a great
resource. Now I can take it wherever I want. I suppose if I were
paranoid about Wikipedia tracking my searches, then I could do this and
do all the searches I wanted offline. Doing it this way also seemed
like a fun tech project.
- Is this legal?
Sure! Wikipedia offers all of their data for use by interested parties.
All of the software involved is open source, except for Windows.
- Where are the pictures?
You can download a dump of the English Wikipedia images from here.
Wikipedia doesn’t package these with the dump for two reasons: 1) the
images might be copyrighted, so they don’t want to distribute them; and
2) the dump file would be huge. As of this writing, the dump file is about 75 GB, which was larger than the hard drive on my laptop.
- Isn’t it overkill to install full MediaWiki?
Yes, but it’s not nearly as much effort as you might think. Plus, with
WAMP, I can experiment with other types of LAMP-based software. You can
always build static pages if you’d prefer something a little more lightweight.
- Won’t the data fall out of date?
Yes, but I’m doing this more to just have a quick reference, rather than
something that’s kept constantly up to date. In that sense, it’s
similar to those encyclopedia CD-ROMs! Besides, the way the dumps are
handled, you’re guaranteed to be slightly out of date. If you really need to be that current, you should probably be going online.
- How can updates be done?
I’m presuming I can just go in mysql and delete from the ‘page’,
‘revision’, and ‘text’ tables; download a new dump; and re-import using
mwdumper.jar. I haven’t actually tried this, though.
- Can I use these instructions to run a wiki web site?
There are a few problems with this. First, the WAMP folk note that
“WAMP5 is not meant to be a production server.” Also, running a web
site takes a fair bit of security knowledge to prevent hacking, so
you’ll get yourself in trouble if you just use it to publish on the
‘net. Finally, you can’t republish the contents of Wikipedia as your
own site. So, technically, you could use the first few parts to set up
wiki, but it’s not a good idea. You’re better off getting a proper web
host that has a one-click install of wiki, such as DreamHost.
- How do I uninstall?
If you want to trash the whole thing, go to “Add or Remove Programs” in
the Control Panel, and select WAMP5. Remove it, then be sure to delete C:wamp
as well.
sumber http://blog.onetechnical.com/2006/08/31/how-to-install-wikipedia-for-offline-access/