Question

如何扫描html页面，查找特定div中的文本？

Answer 1

// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');    

// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]');

Answer 2

preg_match()匹配您想要的子字符串或使用dom / xml。

Answer 3

您也可以使用DOMDocument类来完成此操作。

用法很简单：

$dom = new DOMDocument();
$dom->loadHTML(file_get_contents($url));

// Example:
$dom->getElementById('foo');

文档为here。

可以找到真实世界使用的示例here。

Answer 4

您可以使用其他人建议的内置功能，或者您可以尝试使用Simple HTML DOM Parser实现为一个简单的PHP类和一些辅助函数。它支持CSS选择器样式的屏幕抓取（例如在jQuery中），可以处理无效的HTML，甚至提供熟悉的界面来操作DOM。

值得在http://simplehtmldom.sourceforge.net/

查看

PHP：来自cURL，HTML Scan的数据

4 个答案: