Overview
wget -r automatically explores and downloads files and directories from web servers up to a specified depth. This can be utilized for website mirroring, offline browsing, and collecting specific types of files.
Key Features
- Full website mirroring
- Follows links up to a specified depth
- File type filtering
- Converts links for offline browsing
Key Options
These are the main options for wget -r that allow fine-grained control over the recursive download behavior.
Recursive Download Control
Saving and Output
Generated command:
Try combining the commands.
Description:
`wget` Executes the command.
Combine the above options to virtually execute commands with AI.
Usage Examples
Practical examples of using the wget -r command.
Basic Recursive Download
wget -r https://example.com/docs/
Recursively downloads all content from a specified URL.
Website Mirroring (Depth Limit, Link Conversion)
wget -r -l 2 -k -p https://example.com/
Downloads a website up to a depth of 2 and converts links to local paths for offline viewing.
Download Only Specific File Types
wget -r -A "*.pdf,*.doc" https://example.com/files/
Recursively downloads only PDF and DOC files from a specified directory.
Download Without Ascending to Parent Directories
wget -r -np https://example.com/data/
Recursively downloads only within the current directory and does not move to parent directories.
Specify Download Directory
wget -r -P /home/user/websites https://example.com/
Saves all downloaded files to a specific local directory (`/home/user/websites`).
Limit Download Rate and Set Wait Time
wget -r --limit-rate=200k --wait=1 https://example.com/large-site/
Limits the download speed to 200KB/s and waits 1 second between each request to reduce server load.
Tips & Precautions
When using wget -r, it's important to be mindful of not overloading servers and to download only necessary files for efficiency.
Useful Tips
- Reduce server load with `--wait` option: Sending many requests in quick succession can burden a server. Using an option like `--wait=1` (1-second delay) is recommended.
- Limit bandwidth with `--limit-rate`: You can restrict download speeds to avoid excessive bandwidth usage.
- Respect the Robots Exclusion Protocol (`robots.txt`): Most websites specify crawling rules via their `robots.txt` file. `wget` adheres to this by default, but you can override it with `--execute=robots=off` (not recommended).
- Utilize `-l` (depth) and `-np` (no parent) options to prevent infinite loops: Incorrect configurations can lead to infinite loops or downloading an excessive number of unintended files.
- Use `-k` (convert links) and `-p` (page requisites) options together for offline browsing: These two options are essential for smoothly navigating downloaded websites locally.