Overview
The `-m` option in wget recursively downloads all pages and necessary resources (images, CSS, JavaScript, etc.) of a website, preserving the local directory structure. It also converts links within the downloaded files to local file paths, enabling seamless offline browsing. This feature is particularly useful for creating offline copies of websites for archival or review purposes.
Key Features
Core functionalities of website mirroring using wget -m.
- Recursive download of the entire website
- Automatic conversion of links for offline browsing
- Support for download progress and resuming interrupted downloads
- Default compliance with robots.txt exclusion standard
Key Options
Useful wget options for website mirroring.
Mirroring and Recursive Download
File Handling and Link Conversion
Download Control
Generated command:
Try combining the commands.
Description:
`wget` Executes the command.
Combine the above options to virtually execute commands with AI.
Usage Examples
Practical examples of website mirroring using wget -m.
Basic Website Mirroring
wget -m https://example.com
Mirrors the website at the specified URL to your local machine.
Include All Resources and Convert Links
wget -m -p -k https://example.com
Downloads all necessary resources for HTML pages (images, CSS, etc.) and converts links to local paths.
Wait 5 Seconds Between Downloads
wget -m -w 5 https://example.com
Waits for 5 seconds between each request to avoid overloading the server.
Mirror to a Specific Directory
wget -m -P /var/www/offline_site https://example.com
Saves the downloaded website to the specified local directory (`/var/www/offline_site`).
Specify Maximum Recursion Depth
wget -m -l 2 https://example.com
Mirrors the website but only downloads up to 2 levels deep from the starting URL.
Tips & Considerations
When using wget -m, be mindful of not overloading web servers and ensure sufficient storage space.
Performance and Ethical Considerations
Important points to consider when mirroring websites.
- **Server Load**: It is crucial to use the `-w` (wait) option to introduce delays between requests, thus reducing the burden on the target server. Very short intervals can lead to server blocking.
- **Storage Space**: Mirroring large websites can consume significant disk space. Ensure you have ample storage available beforehand.
- **`robots.txt` Compliance**: `wget` by default respects the website's `robots.txt` file. To ignore it, you can use the `-e robots=off` option, but this may violate website policies and should be used with caution.
- **User Agent**: It is recommended to set a user agent using the `-U` (user-agent) option so that the server can identify the request. The default `wget` user agent might be blocked by some servers.
- **Log Checking**: `wget` outputs download progress and errors to the terminal. You can also save logs to a file using the `-o logfile.txt` option for later review.