可以使用Wget指定要下载的文件内容类型吗?

时间:2011-07-17 05:06:41

标签: linux web-crawler wget

我想用wget下载从网站主页链接的文件,但我只想下载text / html文件。是否可以根据mime内容类型将wget限制为text / html文件?

2 个答案:

答案 0 :(得分:1)

我不认为他们已经实现了这一点。因为它仍然存在错误列表。

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=21148

您可能需要通过文件扩展名

执行所有操作

答案 1 :(得分:0)

Wget2具有此功能。

--filter-mime-type    Specify a list of mime types to be saved or ignored`

### `--filter-mime-type=list`

Specify a comma-separated list of MIME types that will be downloaded.  Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:

  wget2 -r https://<site>/<document> --filter-mime-type=*,\!image/*

It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

  wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)

Wget2截至今天尚未发布,但很快就会发布。 Debian不稳定版已经发布了Alpha版本。

查看https://gitlab.com/gnuwget/wget2了解更多信息。您可以直接将问题/评论发布到bug-wget@gnu.org。