应用错误收集

simple_html_dom无法按预期工作

时间：2014-08-22 11:41:59

标签： php web-scraping

$html = new \simple_html_dom();
$html -> load_file('h*ttp://xxx.com/article.html');
$res = $html->find('div[id=content]',0)->find('p');

$arr = array();//result set
foreach($res as $v){
    $arr[] = strip_tags($v->plaintext);
}
print_r($arr);//print

我想从网页中删除内容，内容封装在＆lt; div ＆gt;中ID为'content'的ID，现在，我检索用＆lt; p ＆gt;附带的每个段落，实际上有另一个标签＆lt; figure ＆gt;在div中，最后我得到了两个＆lt; p ＆gt;的结果并且＆lt; 数字＆gt;，＆lt; 数字＆gt;不应该在那里，我有什么问题？

DOM结构

div id = content p p 数字 p 数字 p p 格

1 个答案:

答案 0 :(得分：0)

这会有用吗？

$res = $html->find('#content p');