Question

在使用Xpath和domDocument获取与给定单词匹配的链接时出现问题。似乎所有事情都在使用for($i=0;$i<$documentLinks->length;$i++){的地方。

任何人都可以帮助我在这里出错吗？

$html  = '<ol>';
$html .= '  <li id="stuff-123"> some copy here </li>';
$html .= '  <li id="stuff-456"> some copy here <a href="http://domain.com">domain</a> </li>';
$html .= '  <li id="stuff-789"> some copy here </li>';
$html .= '</ol>';


    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom); 
    $result = $xpath->query('//ol/li[starts-with(@id, "stuff")]');
    foreach($result as $e){
        $documentLinks = $e->getElementsByTagName('a')->item(0)->nodeValue;
        for($i=0;$i<$documentLinks->length;$i++){
            $documentLink = $documentLinks->item($i);
            if(preg_match("/domain/i", $documentLink->getAttribute("href"))){
              echo $documentLink->getAttribute("href") . "\n";
            }
        }
    }

Answer 1

行：$documentLinks = $e->getElementsByTagName('a')->item(0)->nodeValue;

应该是：$documentLinks = $e->getElementsByTagName('a');

$e->getElementsByTagName('a')

返回$ e的所有子项，其标记为<a ...>，这意味着

$e->getElementsByTagName('a')->item(0);

返回$ e

下的第一个链接

和$documentLinks = $e->getElementsByTagName('a')->item(0)->nodeValue; 正在返回第一个链接的文本。

http://php.net/manual/en/domdocument.getelementsbytagname.php

Answer 2

您可以直接通过XPath

获取href属性

//ol/li[starts-with(@id, "stuff")]/a[contains(@href, "domain")]/@href

然后再做

foreach($result as $href){
    echo $href->nodeValue;
}

请注意contains函数区分大小写。

在xpath / domdocument查询中查找与给定字符串匹配的链接

2 个答案: