Home > Network Management > wget

wget: Mirroring Websites and Saving Offline

wget is a powerful command-line utility for downloading files and websites from web servers using HTTP, HTTPS, and FTP protocols. The `-m` (mirror) option, in particular, allows you to mirror an entire website locally, making it navigable offline. This is extremely useful for website backups, archiving, or when you need to check content without an internet connection.

Overview

The `-m` option in wget recursively downloads all pages and necessary resources (images, CSS, JavaScript, etc.) of a website, preserving the local directory structure. It also converts links within the downloaded files to local file paths, enabling seamless offline browsing. This feature is particularly useful for creating offline copies of websites for archival or review purposes.

Key Features

Core functionalities of website mirroring using wget -m.

  • Recursive download of the entire website
  • Automatic conversion of links for offline browsing
  • Support for download progress and resuming interrupted downloads
  • Default compliance with robots.txt exclusion standard

Key Options

Useful wget options for website mirroring.

Mirroring and Recursive Download

File Handling and Link Conversion

Download Control

Generated command:

Try combining the commands.

Description:

`wget` Executes the command.

Combine the above options to virtually execute commands with AI.

Usage Examples

Practical examples of website mirroring using wget -m.

Basic Website Mirroring

wget -m https://example.com

Mirrors the website at the specified URL to your local machine.

Include All Resources and Convert Links

wget -m -p -k https://example.com

Downloads all necessary resources for HTML pages (images, CSS, etc.) and converts links to local paths.

Wait 5 Seconds Between Downloads

wget -m -w 5 https://example.com

Waits for 5 seconds between each request to avoid overloading the server.

Mirror to a Specific Directory

wget -m -P /var/www/offline_site https://example.com

Saves the downloaded website to the specified local directory (`/var/www/offline_site`).

Specify Maximum Recursion Depth

wget -m -l 2 https://example.com

Mirrors the website but only downloads up to 2 levels deep from the starting URL.

Tips & Considerations

When using wget -m, be mindful of not overloading web servers and ensure sufficient storage space.

Performance and Ethical Considerations

Important points to consider when mirroring websites.

  • **Server Load**: It is crucial to use the `-w` (wait) option to introduce delays between requests, thus reducing the burden on the target server. Very short intervals can lead to server blocking.
  • **Storage Space**: Mirroring large websites can consume significant disk space. Ensure you have ample storage available beforehand.
  • **`robots.txt` Compliance**: `wget` by default respects the website's `robots.txt` file. To ignore it, you can use the `-e robots=off` option, but this may violate website policies and should be used with caution.
  • **User Agent**: It is recommended to set a user agent using the `-U` (user-agent) option so that the server can identify the request. The default `wget` user agent might be blocked by some servers.
  • **Log Checking**: `wget` outputs download progress and errors to the terminal. You can also save logs to a file using the `-o logfile.txt` option for later review.

Same category commands