Wikipedia Offline

Most of what my work online either involves checking mail or browsing forums for getting answers or reading Wikipedia for getting information or social networking. With LAN cuts introduced in the IITs, it is difficult for a student to access information after 12:10 unless they breakout somehow. In an earlier post, I had explained with references to my code, on how to download parts of Wikipedia, I thought it would be helpful to download the whole of Wikipedia on to your computer. In this post I will show you how Wikipedia / stack-overflow / gmail can be download for offline use.
Wikipedia
Requirements:

  • LAMP (Linux, Apache, MySQL, PHP)
  • Around 30 GB of space in primary partition 30 GB of space for storage. In my case the root partition
  • 7 GB of free Internet download
  • 3 days of free time
Wikipedia dumps can be downloaded from the Wikipedia site in XML format compressed in .7zip. This is around 6 GBs when compressed and expands to around 25GB of XML pages. It doesn’t include any images. This page shows how one can extract text articles from articles and construct corpuses from the same. Apart from this, a static HTML dump can also be downloaded from Wikipedia page (wikipedia-en-html.tar.7z) and this version has images in it. The compressed version is at 15 GB and it expands to over 200 GB because of all the images.
The Static HTML dump can simply be extracted to get all the HTML files and the required HTML file can be opened to view the required content. In case you download the XML dump, there is more – you have to extract the articles and create your customized offline Wikipedia.with the following steps.
  1. Download the latest mediawiki and install it on your Linux/Windows machine using LAMP/WAMP/XAMPP. Mediawiki is the software that renders Wikipedia articles using the data stored in MySQL.
  2. Mediawiki needs a few extensions which have been installed in Wikipedia.Once we have mediawiki installed say /var/www/wiki/, download each of them and install by extracting these extensions in the /var/www/wiki/extensions directory.
    The following extensions have to be installed – CategoryTree, CharInsert, Cite, ImageMap, InputBox, ParserFunctions (very important), Poem, randomSelection, SyntaxHighlight-GeSHi, timeline, wikihero which can all be found in the Mediawiki extensions download page by following the instructions. In addition you can install any template to make your wiki look like whatever you want. Now your own Wiki is ready for you to use, you can add your own articles but what we want now is to copy the original Wikipedia articles to our Wiki.
  3. It is easy to import all the data once and then construct an index for the data in MySQL than to update the index each time an article is added. Open MySQL and your database, the tables that are used in the import are text, page and revs. You can delete all the indexes on that page and create it again in the 5th step to speed up the process.
  4. Now that we have our XML database, we need to import it into the MySQL database. You can find the instructions here. In short, a summary of the instructions found on that page, the ONLY WAY you can get Wikipedia really fast on your computer is to use mwdumpertool to import into the database. The inbuilt tool in mediawiki won’t work fast and may run for several days. The following command can be used to import the dump into the database within an hour.
    java -jar mwdumper.jar --format=sql:1.5  | mysql -u  -p 
  5.  Recreate the indexes on the tables ‘page’, ‘revs’ and ‘text’ and you are done.
You can comment if you want to try the same or if you run into any problems while trying.
Stack-overflow
Requirements
  • LAMP (Linux, Apache, MySQL, PHP)
  • Around 15 GB of space in the primary partition and 15 GB of storage. In my case the root partition
  • 4 GB of free Internet download
media10.simplex.tv/content/xtendx/stu/stackoverflow has several stackoverflow zip files available for direct download. Alternatively, stack-overflow dumps can be downloaded using a torrent. A torrent download can be converted into an FTP download using http://www.torrific.com/home/. Once you have the dumps you can unpack them to get huge XML files for several stack sites. Stack-Overflow is one of the stack sites, the 7zip file is broken into 4 parts and have to be combined using a command (cat a.xml b.xml c.xml d.xml > full.xml) Once combined and extracted, we can see 6 xml files for each site (badges, comments, postHistory, posts, users, votes, ) Among these, comments, posts and votes may seem useful for offline usage of the forum. A main post may consist of several reply posts and each such post may have follow-up comments. Votes are used to rate an answer and they can be used as signals while you browse through questions. Follow the following steps to import the data into the database and use the UI to browse posts offline.
  • Download Stack sites
  • Create a database StackOverflow with the schema using the description here. (comments, posts and votes tables are enough)
  • Use the code to import the data to the database. (Suitably modify the variables serveraddress, port, username, password, databasename, rowspercommit, filePath and site in the code)
  • Run the code on Stack Mathematics to import the mathematics site. For bigger sites, it may take much more time and a lot of optimizations are needed along with a lot of disk space in the primary partition where the MySQL stores its databases.
  • Use the UI php files to view a post given the post number along with the comments and replies.
  • TODO: Additionally we can add a search engine that searches the table ‘posts’ for queries and returns post numbers which match the same.
Gmail offline
Requirements:
  • Windows / Mac prefered
  • Firefox prefered
  • 20 minutes for setup
  • 1 hour for download
Gmail allows offline usage of mails, chats, calendar data and contacts. You can follow the following simple series of steps to get gmail on your computer.
  • Install Google gears for firefox
    • You can install google gears from the site http://gears.google.com
    • If you are on Linux, you can install gears package. [sudo apt-get install xul-ext-gears]
    • Note: Gears works well in Windows, may fail on Linux
  • Login to gmail
  • Create new label “offline-download”
  • Create a filter {[subject contains: "Chat with"] or [from: ] -> add label “offline-download” to selectively download your conversations.
  • Enable offline Gmail in settings, and allow download “offline-downloa” for 5 years. You can select the period of time as well.
  • Start, it will end in around an hour and you will have your mails on your computer in an hour.
Offline gmail creates a database called [emailID]@gmail.com#database in your computer. The gears site gives you the location. You can find some information about offline GMail here.
If you want a custom interface for your mails / chats etc, you can create one which queries the SQLITE database mentioned above to present the content however you want. The software diarymaker can be used to read your chat data with plots of frequencies with time and rank your friends based on the interactivity. It works on Linux and uses the Qt platform. I will add a post on it soon.
Feel free to comment on any issue, if you have an idea for downloading any other kind of data on to your computer for offline usage, please let us know with a comment.
Update:
media10.simplex.tv/content/xtendx/stu/stackoverflow Now you can download stackoverflow directly. (Courtesy: Sagar Borkar)
 
sumber http://kashthealien.wordpress.com/2011/08/06/wikipedia-offline/

About NTS

New Thesis SEO (NTS) adalah template blog simple dan ringan (fast loading), responsive, dan SEO Friendly. Dengan template ini Anda tinggal FOKUS ke konten. SEO On Page elemen desain sudah dipenuhi template ini. Free!!! Gratis!

Labels

20 ponsel radiasi tinggi A-theme A-theme 0.5.6 A-theme 0.6b acces point Andree villas-Boas Android apt-web arduino Asus X205TA backlink backlink dofollow backlink dofollow gratis backlink gratis Backports belajar pemrograman bersama ubuntu blankon blankon 7.0 Blankon 7.0 Pattimura blankon 8 blankOn 8 Rote Blankon Pattimura blogspot Boot boot info script browser bullseye buster canon Cara meningkatkan pengunjung website website catatan kecilku Chat chating chelsea CM7 Command line compiz compiz debian squeeze conky conky analog clock conky desktop conky ring meter conky slideshow conky ubuntu coretanku CR7 custom rom customize desktop customize start logo debian debian 6.0.2 debian 9 stretch debian squeeze Desktop di debian squeeze distro DNS doffolow blog download manager Download pdf infolinux 2010 download ubuntu dropbox dx emoticon facebook facebook chat facebook smiley facebook smiley shortcuts firefox firefox 6.0 fjb font squeeze forum Game GingerDX GingerDX 10 Gnome gnote google google ads google analytics google-calendar Grub hang desktop hang GUI Hardware Hp im indicator aplets ini ceritaku mana ceritamu inspiratif Instalasi Ubuntu install aplikasi ubuntu offline install codecs install firefox 4.0 install firefox 4.0 di debian install firefox 4.0 di squeeze install global menu di debian install global menu di debian squeeze install globalmenu install ubuntu install wine install wine di debian install wine di debian squeeze intel Intel Pinang Android Internet ios Isi sources.list debian squeeze iso itunes Jadwal 8 Besar Liga Champions jadwal bola jadwal f1 2011 Jadwal Liga Champions 2011/2012 Jadwal MotoGP 2011 jadwal rilis jadwal rilis ubuntu 11.10 Jadwal Semifinal Liga Champions 2011 Jadwal siaran langsung 28 Okt - 1 Nov 2011 jadwal siaran langsung sepak bola Jadwal Televisi (Rabu-Jumat 17-19 Agustus 2011) jadwal timnas indonesia jailbreak jessi jessie jual beli Kaskus kernel kisah sukses KLIP kode name konfigurasi squid leather letak lib wifi android libreoffice libreoffice debian squeeze Linux linux mint lnux LTS mate mempercantik desktop Mempercantik ubuntu Mengaktifkan Compiz di Debian Squeeze Mengembalikan Data menginstall libreoffice di ubuntu menginstall software menginstall software di ubuntu motorsport mozilla firefox mozilla firefox 6.0 mqfm Multimedia Nautilus Script notes offfice Open Office open source Oprek oracle Other partisi pdf infolinux pelatih baru pemberantasan software bajakan penglaman kecil perbaikan bugs Perintah Linux Piala Eropa U-21 2011 pidgin pihole ponsel radiasi tinggi ppa printer prozgui di debian squeeze RAM cron ramadhan rapbian raspberry raspbian razia razia software bajakan rclone Real Madrid Rekor Rekor Pertandingan Sepak Bola repo rilis blankon 7.0 pattimura rilis kandidat blankon 7.0 riview debian riview debian squeeze samba search engine sekedar sharing selamat menjalankan ibadah puasa Semifinal Piala Eropa U-21 Sepak Bola setting facebook chat setting pidgin setting xmpp shakira smiley sms modem cdma snap Software software di debian squeeze squeeze 6.0.1 squid squid3 surat edaran menteri Tema android thunar thunderbird timnas timnas U23 Tips Kesehatan tips mengatasi ubuntu hang tomboy notes tukar link twrp Ubuntu ubuntu 10.04.2 ubuntu 11.10 ubuntu hang ubuntu Oneiric Ocelot unetbootin vlc VPN wget wifi wikipedia offline Windows Windows 7 wine wireless wordpress X8 x86 XFCE xiaomi Xmind Xperia yahoo zim
Copyright © Geazzy Corner. All rights reserved. Template by CB