Question

我有相当数量的图片搜索结果，我想将其转换为相当数量的实际图片。

所有结果都是单个图像的HTML页面，每个文件都包含子字符串

＆lt; title＆gt; Google-Ergebnisfür[uri]＆lt; / title＆gt;

其中[uri]表示实际结果图像的绝对URI（http：// ...（。gif | .jpg | .jpeg | .bmp））。

但我不明白如何提取uri将其交给wget。

cat imgres.html | grep“＆lt; title＆gt;” | sed's /＆lt; title＆gt; Google-Ergebnisfürhttp：//（。*）＆lt; / title＆gt; / \\ 1 /'

Answer 1

在这种情况下，grep应该有所帮助：

....grep "<title>"|grep -Po "(?<=Google-Ergebnis für )[^<]*"

测试

kent$ echo "<title>Google-Ergebnis für http://foo.bar.baz/blah.png</title>"|grep -Po "(?<=Google-Ergebnis für )[^<]*"
http://foo.bar.baz/blah.png

注意实际上，您可以将两个grep合并为一个。

Answer 2

你接近你的sed命令：

 sed -n 's#<title>Google-Ergebnis für \(http://.*\)</title>#\1#p' imgres.html

不需要cat，grep和多个管道。

Answer 3

这个怎么样？我假设您的问题是如何将从grep / sed中提取的内容传递给wget。

cat imgres.html | grep "<title>" | 
  sed 's#<title>Google-Ergebnis für \(http://.*\)</title>#\1#' |
wget -i -

稍微紧凑：

sed -n '/<title>/{s#.*<title>Google-Ergebnis für \(http://.*\)</title>.*#\1#;p}' imgres.html | 
  wget -i -

请注意#使用/作为s的分隔符。