Overview
The `-L` option in wget, when used in conjunction with recursive download (`-r`), directs wget to follow only relative path links within the current domain. This plays a crucial role in preventing the download of unnecessary data due to external links when mirroring specific subdirectories or structures of a website.
Key Features
- Tracks only relative links, preventing traversal to external domains.
- Suitable for mirroring specific sections of a website.
- Provides powerful functionality when used with recursive download (`-r`).
- Prevents unnecessary data downloads and bandwidth waste.
Key Options
The `-L` option truly shines when combined with other `wget` options, rather than being used in isolation.
Link Tracking and Downloading
Generated command:
Try combining the commands.
Description:
`wget` Executes the command.
Combine the above options to virtually execute commands with AI.
Usage Examples
Various usage examples of `wget` utilizing the `-L` option.
Recursively download only relative links
wget -r -L -np http://example.com/docs/
Recursively download the website starting from the specified URL, following only relative links. Does not ascend to parent directories.
Convert links for local use after download
wget -r -L -np -k http://example.com/docs/
Download as in the previous example, but convert links within the downloaded HTML files to work locally.
Save to a specific directory
wget -r -L -np -k -P my_docs http://example.com/docs/
Save all downloaded files under a directory named 'my_docs'.
Limit download depth
wget -r -L -np -l 2 http://example.com/docs/
During recursive download, follow links only up to 2 levels deep from the starting URL.
Tips & Precautions
Useful tips and precautions when using `wget -L`.
Tips for Efficient Use
- **Utilize the `--level` option**: The `-l` option can limit the depth of recursive downloads, preventing unnecessary file downloads and avoiding infinite loops.
- **`--wait` and `--random-wait`**: It is advisable to introduce a delay between requests to avoid overloading the server. This is particularly useful for large-scale mirroring.
- **`--limit-rate`**: You can manage network bandwidth efficiently by limiting the download speed.
- **`--no-clobber`**: This option prevents overwriting existing files, which is useful for resuming interrupted downloads or avoiding accidental file corruption.
Precautions
- **Server Load**: Excessive recursive downloads can put a strain on the target server. It is recommended to adjust request intervals using the `--wait` option.
- **Respect robots.txt**: Most websites specify crawling rules through a `robots.txt` file. While you can ignore it with the `--execute robots=off` option, this may violate website policies and should be used with caution.
- **Potential for Infinite Loops**: Incorrect option combinations can lead to infinite loops, consuming system resources. It is crucial to clearly define the scope using options like `-np` or `-l` when using `-L` and `-r`.