如何从xpath查询中获取多个数据?

时间:2015-03-14 20:50:19

标签: php xpath web-scraping scraper

这是HTML页面(test.html)

<div id = 'mainid'>
    <div id = 'subid'>
        Name: ABC
    </div>
    <div id = 'subid'>
        Country: USA
    </div>
    <div id = 'subid'>
        Date of birth: 15 Feb 1985
    </div>
</div>
<div id = 'mainid'>
    <div id = 'subid'>
        Name: Jisan
    </div>
    <div id = 'subid'>
        Country: Japan
    </div>
    <div id = 'subid'>
        Date of birth: 15 Feb 1985
    </div>
</div>
<div id = 'mainid'>
    <div id = 'subid'>
        Name: Mr Barman
    </div>
    <div id = 'subid'>
        Country: Canada
    </div>
    <div id = 'subid'>
        Date of birth: 15 Feb 1985
    </div>
</div>

这里的PHP代码

$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

$xpath = new DOMXpath($doc);

$Querys = $xpath->query("*//div[@id='mainid']");
foreach ($Querys as $Querys) {
    echo $Name  = Please help me about this code;
    echo $Country   = Please help me about this code;
    echo $DOB   = Please help me about this code;
}

注意:我想得到这样的结果

Name: ABC, Country: USA, Date of birth: 15 Feb 1985.
Name: Jisan, Country: Japan, Date of birth: 15 Feb 1985.
Name: Mr Barman, Country: Canada, Date of birth: 15 Feb 1985.

1 个答案:

答案 0 :(得分:1)

一种方法是使用DOMXPath :: query的contextnode参数在子子元素的每个mainid元素上执行子查询。像这样:

$mainElements = $xpath->query("*//div[@id='mainid']");
foreach ($mainElements as $mainElement) {
    $subElements = $xpath->query("div[@id='subid']", $mainElement);

    if ($subElements && $subElements->length == 3) {
        $Name = trim($subElements[0]->nodeValue);
        $Country = trim($subElements[1]->nodeValue);
        $DOB = trim($subElements[2]->nodeValue);
        echo "$Name, $Country, $DOB\n";
    } else {
        echo "Invalid number of sub-elements.\n";
    }   
}

请注意,修剪调用是必需的,否则您将最终得到输出中原始文档的所有空格。