我正在尝试获取html div数据,其id从特定名称或字符串开始。
例如,假设我有这个html数据: -
<html>
<div id="post_message_1">
somecontent1
</div>
<div id="post_message_2">
somecontent2
</div>
<div id="post_message_3">
somecontent3
</div>
</html>
为此,我尝试了卷曲。
<?php
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl("myUrl");
$fh = fopen("test.html", 'w'); // we create the file, notice the 'w'. This is to be able to write to the file once.
//writing response in newly created file
fwrite($fh, $html); // here we write the data to the file.
fclose($fh);
?>
如果我使用
$select= $doc->getElementById("post_message_");
然后它没有返回数据,因为它在DOM中搜索这个id,但是在html div id中只从这个字符串开始。它可能是post_message_1或post_message_2。
答案 0 :(得分:0)
我可能会迭代所有div并在他们的id上使用正则表达式来获取我需要的那些。
除非您可以编辑html页面代码并将类添加到包含消息的div中,否则我认为没有更简洁的方法。
答案 1 :(得分:0)
我将file_get_contents_curl
的输出转换为
SimpleXmlElement对象,我使用了xpath
您可以这样做:
$html = <<<HTML
<html>
<div id="post_message_1">
somecontent1
</div>
<div id="post_message_2">
somecontent2
</div>
<div id="post_message_3">
somecontent3
</div>
</html>
HTML;
$dom = new SimpleXMLElement($html);
var_dump($dom->xpath('//div[starts-with(@id, "post_message_")]'));
<强>更新强>
在你的情况下,你应该做这样的事情:
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents_curl($url));
$sxml = simplexml_import_dom($doc);
var_dump($sxml->xpath('//div[starts-with(@id, "post_message_")]'));
答案 2 :(得分:0)
我找到了解决方案,它的工作正常。可能这段代码会帮助别人。感谢@smarber,他的模式帮助我解决了这个问题。
<?php
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl("myUrl");
$dom = new DOMDocument();
$result = $dom->loadHTML($html);
$finder = new DomXPath($dom);
$nodes = $finder->query('//div[starts-with(@id, "post_message_")]');
$tmp_dom = new DOMDocument();
foreach ($nodes as $node)
{
$tmp_dom->appendChild($tmp_dom->importNode($node,true));
}
$innerHTML = trim($tmp_dom->saveHTML());
$fh = fopen("test.html", 'w'); // we create the file, notice the 'w'. This is to be able to write to the file once.
//writing response in newly created file
fwrite($fh, $innerHTML); // here we write the data to the file.
fclose($fh);
?>