提取特定div类php dom

时间:2018-12-13 22:33:28

标签: php html dom extract

嗨,我只想从特定网站提取特定的div类。

这是我所拥有的,但是由于某些原因它无法正常工作,我收到很多错误::

$page = file_get_contents('https://extcall.17track.net/en/track#apitype=1&nums=RK444760227FR');
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
    // Loop through the DIVs looking for one withan id of "content"
    // Then echo out its contents (pardon the pun)
    if ($div->getAttribute('class') === 'tracklist-fill') {
         echo $div->nodeValue;
    }
}

我要提取的是仅不包含品牌或标题或其他元素的跟踪结果

我在做什么错?

欢呼

这些是我得到的错误

Warning: DOMDocument::loadHTML(): Tag main invalid in Entity, line: 1 in /volume1/web/track/test3.php on line 7
Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : p in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : p in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7 
Warning: DOMDocument::loadHTML(): Unexpected end tag : div in Entity, line: 1 in /volume1/web/track/test3.php on line 7

这是您在文件获取内容中在上面看到的网站的html代码段

body> main> div> section.yq-panel.yq-panel-tracklist.jcTrackContainer> div> div.tracklist-fill

<div class="tracklist-fill">
    <div class="tracklist-ps-transit">
        <div class="yqcr-ps" data-ps="10"><a class="btn btn-icon fa-PS_10 ps-bgcolor-10 waves-effect" title="In transit"
                                             href="//help.17track.net/hc/en-us/articles/228084227#10"
                                             yqg-events="{C:功能操作,A:结果页-查看帮助,L:包裹状态_10}" target="_blank"
                                             data-icon=""></a>
            <div data-name=""><p class="text-uppercase" title="RK444760227FR">RK444760227FR</p>
                <p class="text-capitalize" title="In transit">In transit</p></div>
        </div>
        <div class="yqcr-transit">
            <div class="from" data-key="06051">
                <div class="base-info" data-carrier-type="fc">
                    <div><span title="France" data-country="">France</span> <i title="La Poste">La Poste</i></div>
                </div>
                <div class="action-info"><a class="btn btn-icon btn-pure btn-default fa-home waves-effect waves-circle"
                                            target="_blank" href="http://www.laposte.fr/"
                                            yqg-events="{C:功能操作,A:结果页-跳转运输商官网,L:06051}"
                                            title="Go to the carrier's official website."> </a></div>
            </div>
            <div class="to" data-key="07071">
                <div class="base-info" data-carrier-type="sc">
                    <div><span title="Greece" data-country="">Greece</span> <i title="ELTA">ELTA</i></div>
                </div>
                <div class="action-info"><a class="btn btn-icon btn-pure btn-default fa-home waves-effect waves-circle"
                                            target="_blank" href="http://www.elta.gr/"
                                            yqg-events="{C:功能操作,A:结果页-跳转运输商官网,L:07071}"
                                            title="Go to the carrier's official website."> </a></div>
            </div>
        </div>
    </div>
    <div class="tracklist-events scrollable is-enabled scrollable-vertical" yq-data="scrollBox"
         style="position: relative;">
        <div class="scrollable-container" style="height: 360px; width: 909px;">
            <div class="scrollable-content" style="width: 892px;">
                <div class="hide"><p data-newevents="">FRANCE, DEPARTURE FROM OUTWARD OFFICE OF EXCHANGE</p>
                    <time data-newtime="">2018-12-11 07:15</time>
                </div>
                <div class="yqcr-details">
                    <dl class="des-block" data-from="en">
                        <dt><span>Destination</span> <span>: Greece</span> <span>- Tracking consuming: 958 ms</span>
                        </dt>
                        <dd class="new"><i></i>
                            <div>
                                <time>2018-12-11 07:15</time>
                                <p>FRANCE, DEPARTURE FROM OUTWARD OFFICE OF EXCHANGE</p></div>
                        </dd>
                        <dd class=""><i></i>
                            <div>
                                <time>2018-12-08 09:07</time>
                                <p>FRANCE, POSTING/COLLECTION</p></div>
                        </dd>
                    </dl>
                    <dl class="ori-block" data-from="fr">
                        <dt><span>Origin</span> <span>: France</span> <span>- Tracking consuming: 1452 ms</span></dt>
                        <dd class=""><i></i>
                            <div>
                                <time>2018-12-08 00:00</time>
                                <p>CHAMPAGNOLE, Pris en charge</p></div>
                        </dd>
                    </dl>
                </div>
            </div>
        </div>
        <div class="scrollable-bar scrollable-bar-vertical is-disabled scrollable-bar-hide" draggable="false">
            <div class="scrollable-bar-handle"></div>
        </div>
    </div>
</div>

这些是我想要的图片中的元素

https://imgur.com/ajblnNV

1 个答案:

答案 0 :(得分:0)

您收到所有这些错误,因为您尝试解析的HTML无效,即缺少必需的标记等。

更新:

检查完要解析的页面内容后,我可以看到您感兴趣的信息是使用浏览器中的Javascript呈现的。返回的实际HTML只是其中包含一些模板,没有跟踪数据。

<script type="text/template" id="tracking-loading-tpl">
    <%for(var i = 0,len = arrTrackNums.length; i < len; i++){%>
        <div class="tracklist-item tracklist-tracking"
            data-tracknumber="<%=arrTrackNums[i]%>"
            data-trackitem="<%=arrTrackNums[i]%>">
            <div class="tracklist-fill">
                <div class="tracklist-ps-transit"> <%==packageStatus[i]%></div>
                <div class="yqcr-loading-list"> <%==loading%></div>
            </div>
            <div class="tracklist-da">
                <div class="gad-container" id="DA_V6-Extcall-Track"></div>
            </div>
            <%==action%>
        </div>
    <%}%>
</script>

因此,您无法通过使用file_get_contents()DOMDocument加载页面来获取数据。

原始:

您可以使用HTML Tidy清理HTML:

$page = file_get_contents('https://extcall.17track.net/en/track#apitype=1&nums=RK444760227FR');

$config = array(
    'output-html' => 'yes',
    'clean' => 'yes',
);
$tidy = tidy_parse_string($html, $config, 'utf8');
$tidy->cleanRepair();

$doc = new DOMDocument();
$doc->loadHTML($tidy);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
    // Loop through the DIVs looking for one withan id of "content"
    // Then echo out its contents (pardon the pun)
    if ($div->getAttribute('class') === 'tracklist-fill') {
         echo $div->nodeValue;
    }
}