我正在使用simple_html_dom
来获取类html抓取脚本,我在尝试在div中抓取ul时遇到了问题
HTML
<div class="attributes">
<div class="headline">test header</div>
<ul>
<li>test 1</li>
<li>test 2</li>
<li>test 3</li>
</ul>
</div>
PHP
//call to function
$url = 'http://example.com';
$data = dlPage2($url,'.attributes');
echo $data;
//function
function dlPage2($href,$element) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $href);
curl_setopt($curl, CURLOPT_REFERER, $href);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
$str = curl_exec($curl);
curl_close($curl);
// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load($str);
$dom= $dom->find($element,0)->outertext;
return $dom;
}
上面的代码我可以抓取整个<div class="attributes">
,但我需要在该div中获取<ul>
标记的html,
有人可以帮我改变这个吗
答案 0 :(得分:1)
您必须使用
在<ul>
内选择$element
$dom = $dom->find($element.' ul', 0)->outertext;