wget -r: Recursive Website Download

Overview

wget -r automatically explores and downloads files and directories from web servers up to a specified depth. This can be utilized for website mirroring, offline browsing, and collecting specific types of files.

Key Features

Full website mirroring
Follows links up to a specified depth
File type filtering
Converts links for offline browsing

Key Options

These are the main options for wget -r that allow fine-grained control over the recursive download behavior.

Recursive Download Control

Saving and Output

Generated command:

Try combining the commands.

Description:

`wget` Executes the command.

Combine the above options to virtually execute commands with AI.

Usage Examples

Practical examples of using the wget -r command.

Basic Recursive Download

wget -r https://example.com/docs/

Recursively downloads all content from a specified URL.

Website Mirroring (Depth Limit, Link Conversion)

wget -r -l 2 -k -p https://example.com/

Downloads a website up to a depth of 2 and converts links to local paths for offline viewing.

Download Only Specific File Types

wget -r -A "*.pdf,*.doc" https://example.com/files/

Recursively downloads only PDF and DOC files from a specified directory.

Download Without Ascending to Parent Directories

wget -r -np https://example.com/data/

Recursively downloads only within the current directory and does not move to parent directories.

Specify Download Directory

wget -r -P /home/user/websites https://example.com/

Saves all downloaded files to a specific local directory (`/home/user/websites`).

Limit Download Rate and Set Wait Time

wget -r --limit-rate=200k --wait=1 https://example.com/large-site/

Limits the download speed to 200KB/s and waits 1 second between each request to reduce server load.

Tips & Precautions

When using wget -r, it's important to be mindful of not overloading servers and to download only necessary files for efficiency.

Useful Tips

Reduce server load with `--wait` option: Sending many requests in quick succession can burden a server. Using an option like `--wait=1` (1-second delay) is recommended.
Limit bandwidth with `--limit-rate`: You can restrict download speeds to avoid excessive bandwidth usage.
Respect the Robots Exclusion Protocol (`robots.txt`): Most websites specify crawling rules via their `robots.txt` file. `wget` adheres to this by default, but you can override it with `--execute=robots=off` (not recommended).
Utilize `-l` (depth) and `-np` (no parent) options to prevent infinite loops: Incorrect configurations can lead to infinite loops or downloading an excessive number of unintended files.
Use `-k` (convert links) and `-p` (page requisites) options together for offline browsing: These two options are essential for smoothly navigating downloaded websites locally.