Using WGET to download whole website

Recently, I came about some e-books that are html only (sucks yeah), but they are good books and I want to really have them locally. So I need to download ’em.

I know. There are GUI tools for it. But what if you are stuck in a terminal only server? I am behind a very strict proxy, but I have a server that I can FTP into and the server is not behind the proxy. But the server is terminal only, hence the wget option.

wget can download the whole internet if you so wish. and it’s simple

wget -r url

Now before you go there are a few caveats.

The sites will be downloaded, but will not be really suitable for offline viewing. To enable relative links do

wget -rk url

The above will convert the files to be suitable for offline viewing as necessary. You might want wget to keep the original files.

wget -rkK url

Also another caveat. This option will only download the html file. To tell wget to download all files necessary to display the page properly (images, sounds, linked css etc) use

wget -rkp url

Again, don’t go yet. The default level of links to follow is 5. This might be too much (or too small in case your plan is to download the whole internets). you can specify the link level thus

wget -rkpl 5 url

Finally, you might want wget to do all the hard work of downloading the internet and delete the files immediately after.

wget -r –delete-after url

man wget

is also a good place to start learning more about the things that wget can do.

That’s it. Happy interwebs downloading.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: