Download Whole Website in Linux the Smart Way !!

by Vivek Yadav on September 7, 2008

Have you ever Googled the Internet for a software to Download complete website for you , but you only found a  Windows software or maybe a Linux one too , but did you ever knew that your Linux box has a nifty command to make all your troubles go away and download a full website with just a single command , Yes ! wget does it and here is the command just copy paste it in the shell and edit the website details at the bottom .

$ wget \
–recursive \
–no-clobber \
–page-requisites \
–html-extension \
–convert-links \
–restrict-file-names=windows \
–domains techstroke.com \
–no-parent \
www.techstroke.com/Windows/


This command downloads the Web site www.techstroke.com/Windows/.

The options are:

  • –recursive: download the entire Web site.
  • –domains-techstroke.com: don’t follow links outside techstroke.com.
  • –no-parent: don’t follow links outside the directory /Windows/.
  • –page-requisites: get all the elements that compose the page (images, CSS and so on).
  • –html-extension: save files with the .html extension.
  • –convert-links: convert links so that they work locally, off-line.
  • –restrict-file-names=windows: modify filenames so that they will work in Windows as well.
  • –no-clobber: don’t overwrite any existing files (used in case the download is interrupted and
    resumed).

All these options are uber cool and they download a perfect browsable copy with all images javascript and css intact !!

via [linuxJournal]

  • Share/Bookmark

{ 5 comments… read them below or add one }

Rava March 28, 2009 at 3:16 am

Hi, I tried that with a webpage. The one is https, but all was OK when using –no-check-certificate

I used “-r –no-check-certificate -page-requisites” only. The problem: The main css file is saved, and in it the other css files are linked with
@import url(bla.css);
@import url(blubb.css);

and none of these are saved at all. Is there a wget tweak that also makes wget save these files as well?

–page-requisites itself should be enough to make wget do it, but it seems it just won’t. Using wget 1.11.1

Vivek April 6, 2009 at 12:25 am

@Rava I dont think so , that we have such a tweak, it parses html files but not the CSS ones, I went through the wget manual and found nothing regarding this, you can google for some website copy tool for linux if you need this done .

Chrissy September 26, 2009 at 6:01 pm

Wow…

Swapnil March 4, 2010 at 6:07 am

Thanks for sharing this … the best

taplinb March 19, 2010 at 8:25 am

Sweet!

Just snatched a site into Linux Mint 8. Had to replace the leading “-” characters with “–”, as in -recursive became –recursive, but then it worked great. Rusty old ex-tech’s request: can you make it a simple bash script? I suspect its just the above with variables, but I forget how to parse. Please don’t say RTFM; I lack the time this month, but could really use this.

Many thanks.

-Brad

Leave a Comment

Previous post:

Next post: