Most of what my work online either involves checking mail or browsing
forums for getting answers or reading Wikipedia for getting information
or social networking. With LAN cuts introduced in the IITs, it is
difficult for a student to access information after 12:10 unless they
breakout somehow. In an earlier post, I had explained with references to
my code, on how to download parts of Wikipedia, I thought it would be
helpful to download the whole of Wikipedia on to your computer. In this
post I will show you how Wikipedia / stack-overflow / gmail can be
download for offline use.
Wikipedia
Requirements:
- LAMP (Linux, Apache, MySQL, PHP)
- Around 30 GB of space in primary partition 30 GB of space for storage. In my case the root partition
- 7 GB of free Internet download
- 3 days of free time
The Static HTML dump can simply be extracted to get all the HTML files and the required HTML file can be opened to view the required content. In case you download the XML dump, there is more – you have to extract the articles and create your customized offline Wikipedia.with the following steps.
- Download the latest mediawiki and install it on your Linux/Windows machine using LAMP/WAMP/XAMPP. Mediawiki is the software that renders Wikipedia articles using the data stored in MySQL.
- Mediawiki needs a few extensions which have been installed in
Wikipedia.Once we have mediawiki installed say /var/www/wiki/, download
each of them and install by extracting these extensions in the
/var/www/wiki/extensions directory.
The following extensions have to be installed – CategoryTree, CharInsert, Cite, ImageMap, InputBox, ParserFunctions (very important), Poem, randomSelection, SyntaxHighlight-GeSHi, timeline, wikihero which can all be found in the Mediawiki extensions download page by following the instructions. In addition you can install any template to make your wiki look like whatever you want. Now your own Wiki is ready for you to use, you can add your own articles but what we want now is to copy the original Wikipedia articles to our Wiki. - It is easy to import all the data once and then construct an index for the data in MySQL than to update the index each time an article is added. Open MySQL and your database, the tables that are used in the import are text, page and revs. You can delete all the indexes on that page and create it again in the 5th step to speed up the process.
- Now that we have our XML database, we need to import it into the MySQL database. You can find the instructions here.
In short, a summary of the instructions found on that page, the ONLY
WAY you can get Wikipedia really fast on your computer is to use mwdumpertool
to import into the database. The inbuilt tool in mediawiki won’t work
fast and may run for several days. The following command can be used to
import the dump into the database within an hour.
java -jar mwdumper.jar --format=sql:1.5
| mysql -u -p - Recreate the indexes on the tables ‘page’, ‘revs’ and ‘text’ and you are done.
Stack-overflow
Requirements
- LAMP (Linux, Apache, MySQL, PHP)
- Around 15 GB of space in the primary partition and 15 GB of storage. In my case the root partition
- 4 GB of free Internet download
- Download Stack sites
- Create a database StackOverflow with the schema using the description here. (comments, posts and votes tables are enough)
- Use the code to import the data to the database. (Suitably modify the variables serveraddress, port, username, password, databasename, rowspercommit, filePath and site in the code)
- Run the code on Stack Mathematics to import the mathematics site. For bigger sites, it may take much more time and a lot of optimizations are needed along with a lot of disk space in the primary partition where the MySQL stores its databases.
- Use the UI php files to view a post given the post number along with the comments and replies.
- TODO: Additionally we can add a search engine that searches the table ‘posts’ for queries and returns post numbers which match the same.
Gmail offline
Requirements:
- Windows / Mac prefered
- Firefox prefered
- 20 minutes for setup
- 1 hour for download
Gmail allows offline usage of mails, chats, calendar data and
contacts. You can follow the following simple series of steps to get
gmail on your computer.
- Install Google gears for firefox
- You can install google gears from the site http://gears.google.com
- If you are on Linux, you can install gears package. [sudo apt-get install xul-ext-gears]
- Note: Gears works well in Windows, may fail on Linux
- Login to gmail
- Create new label “offline-download”
- Create a filter {[subject contains: "Chat with"] or [from:
] -> add label “offline-download” to selectively download your conversations. - Enable offline Gmail in settings, and allow download “offline-downloa” for 5 years. You can select the period of time as well.
- Start, it will end in around an hour and you will have your mails on your computer in an hour.
Offline gmail creates a database called [emailID]@gmail.com#database in your computer. The gears site gives you the location. You can find some information about offline GMail here.
If you want a custom interface for your mails / chats etc, you can
create one which queries the SQLITE database mentioned above to present
the content however you want. The software diarymaker can
be used to read your chat data with plots of frequencies with time and
rank your friends based on the interactivity. It works on Linux and uses
the Qt platform. I will add a post on it soon.
Feel free to comment on any issue, if you have an idea for
downloading any other kind of data on to your computer for offline
usage, please let us know with a comment.
Update:
media10.simplex.tv/content/xtendx/stu/stackoverflow Now you can download stackoverflow directly. (Courtesy: Sagar Borkar)
sumber http://kashthealien.wordpress.com/2011/08/06/wikipedia-offline/