Home > Network Management > wget

wget -L: Recursively Download Only Relative Links

wget is a powerful command-line utility used for downloading files from web servers in a non-interactive manner. The `-L` or `--relative` option, when used with recursive downloads, instructs wget to follow only the relative links of the specified URL. This is useful for downloading a specific section of a website or maintaining its internal link structure, preventing unnecessary traversal to external domains and allowing for efficient collection of only the desired content.

Overview

The `-L` option in wget, when used in conjunction with recursive download (`-r`), directs wget to follow only relative path links within the current domain. This plays a crucial role in preventing the download of unnecessary data due to external links when mirroring specific subdirectories or structures of a website.

Key Features

  • Tracks only relative links, preventing traversal to external domains.
  • Suitable for mirroring specific sections of a website.
  • Provides powerful functionality when used with recursive download (`-r`).
  • Prevents unnecessary data downloads and bandwidth waste.

Key Options

The `-L` option truly shines when combined with other `wget` options, rather than being used in isolation.

Link Tracking and Downloading

Generated command:

Try combining the commands.

Description:

`wget` Executes the command.

Combine the above options to virtually execute commands with AI.

Usage Examples

Various usage examples of `wget` utilizing the `-L` option.

Recursively download only relative links

wget -r -L -np http://example.com/docs/

Recursively download the website starting from the specified URL, following only relative links. Does not ascend to parent directories.

Convert links for local use after download

wget -r -L -np -k http://example.com/docs/

Download as in the previous example, but convert links within the downloaded HTML files to work locally.

Save to a specific directory

wget -r -L -np -k -P my_docs http://example.com/docs/

Save all downloaded files under a directory named 'my_docs'.

Limit download depth

wget -r -L -np -l 2 http://example.com/docs/

During recursive download, follow links only up to 2 levels deep from the starting URL.

Tips & Precautions

Useful tips and precautions when using `wget -L`.

Tips for Efficient Use

  • **Utilize the `--level` option**: The `-l` option can limit the depth of recursive downloads, preventing unnecessary file downloads and avoiding infinite loops.
  • **`--wait` and `--random-wait`**: It is advisable to introduce a delay between requests to avoid overloading the server. This is particularly useful for large-scale mirroring.
  • **`--limit-rate`**: You can manage network bandwidth efficiently by limiting the download speed.
  • **`--no-clobber`**: This option prevents overwriting existing files, which is useful for resuming interrupted downloads or avoiding accidental file corruption.

Precautions

  • **Server Load**: Excessive recursive downloads can put a strain on the target server. It is recommended to adjust request intervals using the `--wait` option.
  • **Respect robots.txt**: Most websites specify crawling rules through a `robots.txt` file. While you can ignore it with the `--execute robots=off` option, this may violate website policies and should be used with caution.
  • **Potential for Infinite Loops**: Incorrect option combinations can lead to infinite loops, consuming system resources. It is crucial to clearly define the scope using options like `-np` or `-l` when using `-L` and `-r`.

Same category commands