Question

这是我的xPath，但我看到了|运算符只计算2？我怎么能超过两个呢？我在下面发布了我的代码

function extractNodeValue($query, $xPath, $attribute = null) {
    $node = $xPath->query("//{$query}")->item(0);
    if (!$node) {
        return null;
    }
    return $attribute ? $node->getAttribute($attribute) : $node->nodeValue;
}


$document = new DOMDocument();
$document->loadHTMLfile(${'html'.$i});
$xPath = new DOMXpath($document);

    $tel = extractNodeValue('//*[@id="eventDetailInfo"]/div[3]/div[4] | //*[@id="eventDetailInfo"]/div[3]/div[3] | //*[@id="eventDetailInfo"]/div[3]/div[5]',$xPath);

Answer 1

当您编写2+2+2时，+是二元运算符;你的表达意味着(2+2)+2。

XPath中的|类似于二元运算符，但由于结果与操作数的类型相同，因此它以相同的方式与自身组合：$x|$y|$z表示($x|$y)|$z。

Answer 2

您传递的extractNodeValue函数的XPath查询将导致 // a | b | c ，只返回a个节点，忽略b和c个节点。

可能你想运行 // a | // b | // c 来获取a，b或c节点的第一次出现，对吧？

如果是这种情况，你必须改变你使用$query参数的方式：

<?php
$html = <<<HTML
<html>
    <div>
        <a>Empire Burlesque</a>
        <b>Bob Dylan</b>
        <i>USA</i>
    </div>
    <div>
        <a>Hide your heart</a>
        <b>Bonnie Tyler</b>
        <i>UK</i>
    </div>
</html>
HTML;

function extractNodeValue($query, $xPath, $attribute = null) {
    $node = $xPath->query($query)->item(0);
    if (!$node) {
        return null;
    }

    return $attribute ? $node->getAttribute($attribute) : $node->nodeValue;
}

$document = new DOMDocument();
$document->loadHTML($html);
$xPath = new DOMXpath($document);

$tel = extractNodeValue('//a | //b | //i', $xPath);
echo $tel;

输出：

Empire Burlesque

Answer 3

似乎你的问题在某种程度上导致了错误的方向 xpaht没有任何问题。正如此处已经指出的那样，xpath查询|找到的节点数量不受限制。

但是你在extractNodeValue（）中使用->item(0);，它只会查看第一项。

如果您想输出所有找到的节点的“值”，请尝试以下方法：

function extractNodeValue($node, $attribute = null) {

    if (!$node) {
        return null;
    }
    return $attribute ? $node->getAttribute($attribute) : $node->nodeValue;
}

$document = new DOMDocument();
$document->loadHTML(${'html'.$i});
$xPath = new DOMXpath($document);

$nodes = $xPath->query('//a | //b | //c');
foreach($nodes as $n) {
    $tel = extractNodeValue($n);
    echo $tel;
}

如果这不起作用，那么您的html页面很可能与您的xapth表达式不匹配。

<强>更新
查看评论中的html页面：要获取电话号码，请尝试以下操作：

$tel = extractNodeValue('//div[@id="eventDetailInfo"]//div[@class= "tel"]',$xPath);

返回：

string(15) "Phone: 22674608"

Answer 4

正如其他人已经指出的那样，您的电话查询将生成多个节点，但您的extractNodeValue函数仅返回其中一个节点。我的建议是重写它：

function extractNodeValue($query, $xPath, $attribute = null) {
  $values = array();
  foreach($xPath->query("//{$query}") as $node) {
    $values[] = $attribute ? $node->getAttribute($attribute) : $node->nodeValue;
  }
  return join($values, ", ");
}

这会在单个逗号分隔的字符串中返回多个值，但是更容易更改分隔符，或者将值作为数组返回（如果这对您更有用）。

我还建议删除在此处添加到查询中的“//”，并将其包含在调用代码中。否则你最终会在某些情况下添加两次。但到目前为止，这并不是必不可少的。

至于电话查询本身，您的查询取决于电话div处于某些固定位置，这完全不可靠（在某些页面上它也与网站和电子邮件地址相匹配）。

在我看来，您需要匹配两个不同的情况：在“Where”div（ eventDetailInfo 部分的div 3）下，以及“Contact”div下（div 4））。

在“Where”div下，电话号码可以处于不同的位置，但它们总是有一类“tel”，所以最安全的查询可能是这样的：

//*[@id="eventDetailInfo"]/div[3]/*[@class="tel"]

在“联系人”div下，电话号码也可以处于不同的位置，但电话号码上没有任何可以匹配的课程。但是，该div中的内容始终以字符串“Phone：”开头，因此一种解决方案是使用XPath starts-with函数。

//*[@id="eventDetailInfo"]/div[4]/div[starts-with(.,"Phone:")]

然后，您可以将这两个查询与union（|）运算符结合使用，以涵盖这两种情况。或者（我认为这是一个更好的解决方案），你可以通过使它更通用来使用第二个查询，如下所示：

//*[@id="eventDetailInfo"]//div[starts-with(.,"Phone:")]

一个可能的缺点是，这不再将搜索限制为“Where”和“Contact”div，所以如果 eventDetailInfo 部分的其他部分中有电话号码，它将匹配那些（虽然这可能是一件好事）。

请注意，即使没有联合，此查询仍会在某些页面上返回多个节点。无论哪种方式，如果您想获得所有值，使用更新的extractNodeValue函数至关重要。

xPath或operator 2计算两个以上？

4 个答案:

输出：