Question

我正在尝试从大量谷歌搜索结果中提取网址。从源代码中获取它们证明是非常具有挑战性的，因为分隔符并不清楚，并且代码中并不是所有的URL都是如此。是否有工具可以从图像的某个区域中提取网址？如果是这样可能是更好的解决方案。

非常感谢任何帮助。

Answer 1

请尝试使用JSON / Atom自定义搜索API：http://code.google.com/apis/customsearch/v1/overview.html。它每天为您提供100次api通话，如果您付费，可以增加到每天10000次。

Answer 2

使用这个优秀的lib：http://simplehtmldom.sourceforge.net/manual.htm

// Grab the source code
$html = file_get_html('http://www.google.com/');

// Find all anchors, returns a array of element objects
$ret = $html->find('a');

// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $ret->href;

编辑：

所有“自然”搜索网址都在#res div中。似乎是simplehtmldom找到第一个#res，而不是其中的所有网址。不记得确切的语法，但必须这样：

$ret = $html->find('div[id=res]')->find('a');

或者

$html->find('div[id=res] a');

如何在图像中查找网址

2 个答案: