嗨,我只想从特定网站提取特定的div类。
这是我所拥有的,但是由于某些原因它无法正常工作,我收到很多错误::
$page = file_get_contents('https://extcall.17track.net/en/track#apitype=1&nums=RK444760227FR');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
if ($div->getAttribute('class') === 'tracklist-fill') {
echo $div->nodeValue;
}
}
我要提取的是仅不包含品牌或标题或其他元素的跟踪结果
我在做什么错?
欢呼
这些是我得到的错误
Warning: DOMDocument::loadHTML(): Tag main invalid in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : p in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : p in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7
这是您在文件获取内容中在上面看到的网站的html代码段
body> main> div> section.yq-panel.yq-panel-tracklist.jcTrackContainer> div> div.tracklist-fill
<div class="tracklist-fill">
<div class="tracklist-ps-transit">
<div class="yqcr-ps" data-ps="10"><a class="btn btn-icon fa-PS_10 ps-bgcolor-10 waves-effect" title="In transit"
href="//help.17track.net/hc/en-us/articles/228084227#10"
yqg-events="{C:功能操作,A:结果页-查看帮助,L:包裹状态_10}" target="_blank"
data-icon=""></a>
<div data-name=""><p class="text-uppercase" title="RK444760227FR">RK444760227FR</p>
<p class="text-capitalize" title="In transit">In transit</p></div>
</div>
<div class="yqcr-transit">
<div class="from" data-key="06051">
<div class="base-info" data-carrier-type="fc">
<div><span title="France" data-country="">France</span> <i title="La Poste">La Poste</i></div>
</div>
<div class="action-info"><a class="btn btn-icon btn-pure btn-default fa-home waves-effect waves-circle"
target="_blank" href="http://www.laposte.fr/"
yqg-events="{C:功能操作,A:结果页-跳转运输商官网,L:06051}"
title="Go to the carrier's official website."> </a></div>
</div>
<div class="to" data-key="07071">
<div class="base-info" data-carrier-type="sc">
<div><span title="Greece" data-country="">Greece</span> <i title="ELTA">ELTA</i></div>
</div>
<div class="action-info"><a class="btn btn-icon btn-pure btn-default fa-home waves-effect waves-circle"
target="_blank" href="http://www.elta.gr/"
yqg-events="{C:功能操作,A:结果页-跳转运输商官网,L:07071}"
title="Go to the carrier's official website."> </a></div>
</div>
</div>
</div>
<div class="tracklist-events scrollable is-enabled scrollable-vertical" yq-data="scrollBox"
style="position: relative;">
<div class="scrollable-container" style="height: 360px; width: 909px;">
<div class="scrollable-content" style="width: 892px;">
<div class="hide"><p data-newevents="">FRANCE, DEPARTURE FROM OUTWARD OFFICE OF EXCHANGE</p>
<time data-newtime="">2018-12-11 07:15</time>
</div>
<div class="yqcr-details">
<dl class="des-block" data-from="en">
<dt><span>Destination</span> <span>: Greece</span> <span>- Tracking consuming: 958 ms</span>
</dt>
<dd class="new"><i></i>
<div>
<time>2018-12-11 07:15</time>
<p>FRANCE, DEPARTURE FROM OUTWARD OFFICE OF EXCHANGE</p></div>
</dd>
<dd class=""><i></i>
<div>
<time>2018-12-08 09:07</time>
<p>FRANCE, POSTING/COLLECTION</p></div>
</dd>
</dl>
<dl class="ori-block" data-from="fr">
<dt><span>Origin</span> <span>: France</span> <span>- Tracking consuming: 1452 ms</span></dt>
<dd class=""><i></i>
<div>
<time>2018-12-08 00:00</time>
<p>CHAMPAGNOLE, Pris en charge</p></div>
</dd>
</dl>
</div>
</div>
</div>
<div class="scrollable-bar scrollable-bar-vertical is-disabled scrollable-bar-hide" draggable="false">
<div class="scrollable-bar-handle"></div>
</div>
</div>
</div>
这些是我想要的图片中的元素
答案 0 :(得分:0)
您收到所有这些错误,因为您尝试解析的HTML无效,即缺少必需的标记等。
更新:
检查完要解析的页面内容后,我可以看到您感兴趣的信息是使用浏览器中的Javascript呈现的。返回的实际HTML只是其中包含一些模板,没有跟踪数据。
<script type="text/template" id="tracking-loading-tpl">
<%for(var i = 0,len = arrTrackNums.length; i < len; i++){%>
<div class="tracklist-item tracklist-tracking"
data-tracknumber="<%=arrTrackNums[i]%>"
data-trackitem="<%=arrTrackNums[i]%>">
<div class="tracklist-fill">
<div class="tracklist-ps-transit"> <%==packageStatus[i]%></div>
<div class="yqcr-loading-list"> <%==loading%></div>
</div>
<div class="tracklist-da">
<div class="gad-container" id="DA_V6-Extcall-Track"></div>
</div>
<%==action%>
</div>
<%}%>
</script>
因此,您无法通过使用file_get_contents()
和DOMDocument
加载页面来获取数据。
原始:
您可以使用HTML Tidy清理HTML:
$page = file_get_contents('https://extcall.17track.net/en/track#apitype=1&nums=RK444760227FR');
$config = array(
'output-html' => 'yes',
'clean' => 'yes',
);
$tidy = tidy_parse_string($html, $config, 'utf8');
$tidy->cleanRepair();
$doc = new DOMDocument();
$doc->loadHTML($tidy);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
if ($div->getAttribute('class') === 'tracklist-fill') {
echo $div->nodeValue;
}
}