我正在尝试从URL获取数据,并且仅从具有title =“”的范围内检索数据 每个“行”数据都有一个跨度,跨度具有不同的标题增量值,例如
title="1", title="2"
所以我要获取的数据将在此范围内 资料在这里 x将是一个递增数字
我能够使用此代码从页面中获取所有数据,但是我仍然坚持如何实现自己的需求
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl("http://www.example.com");
//parsing all content:
$doc = new DOMDocument();
@$doc->loadHTML($html);
echo "$html";
数据格式如下:
<span id="RANDOMINFO">
<a href="/DEMO/RANDOMDATA">+</a>
<span title="1">DATA I WANT HERE</span>
<a href="https://URL.COM/RANDOM">CLICK</a>
<a href="https://URL.COM/RANDOM">RANDOM DATA</a>
</span>
<span id="RANDOMINFO">
<a href="/DEMO/RANDOMDATA">+</a>
<span title="2">DATA I WANT HERE</span>
<a href="https://URL.COM/RANDOM">CLICK</a>
<a href="https://URL.COM/RANDOM">RANDOM DATA</a>
</span>
答案 0 :(得分:0)
解决方案: 可以在提供的代码中以注释的形式进行解释
$doc = new DOMDocument();
@$doc->loadHTML($html);
foreach($doc->getElementsByTagName('span') as $element ) { //Loops through all available span elements
if (empty($element->attributes->getNamedItem('id')->value) || $element->attributes->getNamedItem('id')->value != 'RANDOMINFO') { // Discards irrelevant span elements based on their `ID`. A similar sorting is achieved with `empty()` as the target `span` doesn't have any associated `ID`.
echo get_inner_html($element).PHP_EOL;
}
}
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveHTML( $child ); //fetches the text inside child elements of the targeted element
}
return $innerHTML;
}
输出:
DATA I WANT HERE
DATA I WANT HERE
参考文献: