如何从本地html文件下载图像?

时间:2011-12-02 07:29:21

标签: linux bash command-line-interface

我有一些简单的html页面Test.html,test2.html,test3.html。这个页面有一些图像链接:

<img src="http://site.org/path/to/file/6c7f2.jpeg"/>

如何自动从这个页面下载所有图像,放在html文件附近并将html页面中的链接更改为本地图像?

谢谢!

1 个答案:

答案 0 :(得分:0)

尝试命令$ wget -F -i <html_file>

这将下载<html_file>中包含的每个链接,并将它们放在当前目录中。我建议你阅读OPTIONS部分下的wget($ man wget)手册,我从中提取了以下内容:

  

-i文件   --input文件=文件

  Read URLs from a local or external file.  If - is specified as file, URLs are
read from the standard input.  (Use ./- to read from a file literally named -.)

  If this function is used, no URLs need be present on the command line.  If
there are URLs both on the command line and in an input file, those on the
command lines will be the first ones to be retrieved.  If --force-html is not
specified, then file should consist of a series of URLs, one per line.

  However, if you specify --force-html, the document will be regarded as html.
In that case you may have problems with relative links, which you can solve
either by adding "<base href="url">" to the documents or by specifying
--base=url on the command line.

  If the file is an external one, the document will be automatically treated as
html if the Content-Type matches text/html. Furthermore,the file's location
will be implicitly used as base href if none was specified.

和选项:

  

-F   --force-HTML

  When input is read from a file, force it to be treated as an HTML file.
This enables you to retrieve relative links from existing HTML files on
your local disk, by adding "<base href="url">" to HTML, or using the
--base command-line option.

另外,我建议您阅读手册页中的--output-file选项。

这只会处理下载内容...对你的html文件进行自动更改我认为你需要其他工具,shellcripting要么不提供,要么就是这样,使用起来非常复杂。我建议在python中使用上面提到的命令来下载东西,并使用一些python专用库来处理(解析)文件,并进行方便的更改。

祝你好运!!!