如何使用DOMparser / Xpath对div下的DL的DT和DD进行网络抓取

时间:2018-10-02 03:13:08

标签: php xpath web-scraping domparser

我正在尝试获取DL的DT和DD,它们属于同一类,并且试图将它们放在foreach中。但是面对一些麻烦,

<dl class="c-explain2">
        <dt>所在地</dt>
                <dd>
                    大阪府大阪市 北区天満1丁目25番1(地番)
                        <br>

这是我的密码;

$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);

$classname="c-explain2";
$getAllTable = $xpath->query("//dl[contains(@class, '$classname')]//");

foreach($getAllTable as $table){
            $allProperties = [];

            $table->getElementsByTagName('dt')[0]->nodeValue;

            $value = $table->getElementsByTagName('dd')[0]->nodeValue;
            $allProperties[] = [
                    'property' => $property, 
                    'value'=> $value];
            }
                $insertData[$start_id] = $allProperties;
                $MyTable = true; 

如何获取那些dt和dd,之后要把它们放在数组中。有什么帮助吗?谢谢。

1 个答案:

答案 0 :(得分:0)

您的XPath表达式有问题,应该为"//dl[@class='$classname']"

同样,您似乎从未在循环中分配$property。试试这个:

<?php
$html = <<<END
<dl class="c-explain2">
        <dt>所在地</dt>
        <dd>大阪府大阪市 北区天満1丁目25番1(地番</dd>
</dl>
END;

$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);

$classname   = "c-explain2";
$getAllTable = $xpath->query("//dl[@class='$classname']");

foreach ($getAllTable as $table)
{
    $allProperties = [];

    $property = $table->getElementsByTagName('dt')[0]->nodeValue;

    $value           = $table->getElementsByTagName('dd')[0]->nodeValue;
    $allProperties[] = [
        'property' => $property,
        'value' => $value
    ];
}