Question

尝试运行简单的xpath，现在只显示空节点。

来源：任何XML文件。假设

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="pt-br" xml:lang="pt-br">
  <head> <meta charset="utf-8"/><title>test</title> </head>
  <body>
    <article id="etc"><p>Hello</p><p>Bye</p></article>
  </body>
</html>

我重做所有，这里包括一个完整的测试：

$dom2 = new DOMDocument;
$dom2->Load($pathFile);
$xpath2 = new DOMXPath($dom2);
$entries = $xpath->query('//p');
// nothing here, all empty:
var_dump($entries);  // zero!
foreach ($entries as $entry) {
    echo "Found {$entry->nodeValue},";
}
// by all here!  
foreach($dom2->getElementsByTagName('*') as $e )
  print "\n name={$e->nodeName}";  // all tags!

什么是worng，为什么xpath没有运行？

Answer 1

那是因为您的xml定义了默认命名空间：

xmlns="http://www.w3.org/1999/xhtml"

因此，您需要注册命名空间，然后使用命名空间标记名称进行搜索：

$xpath->registerNamespace('x', 'http://www.w3.org/1999/xhtml');
$entries = $xpath->query('//x:p');

Answer 2

这是W3C的DomDocument v1.0标准的一个老问题。关于XPath初学者的https://weblogs.asp.net/gunnarpeipman/stepping-into-asp-net-mvc-source-code-with-visual-studio-debugger令人惊讶，

关于（...）的常见问题之一是：＆＃34;为什么我的XPath表达式没有任何匹配似乎对我来说？＆＃34; <这些问题的常见原因是没有正确定义XPath的命名空间。

但是初学者是正确的，对于＆＃34;默认的事情来说是丑陋的行为＆＃34; ...所以让我们保持初学者对简单和好的直觉的良好直觉

看到一个看起来不像你需要的XPath是很可怕的（当它的标签没有前缀时，XML似乎是什么）。标签是简单的标签，需要简单的XPath。

可靠的解决方法

使用最佳解决方案修复丑陋的XPath查询行为。这不是简单的，因为root xmlns属性an old site commented，所以我们需要通过新的字符串XML重新执行DOM对象：

$expTag = 'html';  // config expected tag-root
$expNs  = 'http://www.w3.org/1999/xhtml';  // config
// ...
$e = $dom->documentElement; // root node

// Validate input (as expecteds configs) and change tag root:
if ($e->nodeName==$expTag && $e->hasAttribute('xmlns') 
    && $e->getAttribute('xmlns')==$expNs) {
  // can't do $e->removeAttribute('xmlns') because is read-only!
  $xml = $dom->C14N(); // normalize quotes and remove repeateds
  $xml = preg_replace("#^<$expTag (.*?)xmlns=\"[^\"]+\"#", "<$expTag\$1", $xml);
  $dom = DOMDocument::LoadXML($xml);
} else 
     die("\n ERROR: something not expected.\n");
//...
$xpath = new DOMXPath($dom);
$entries = $xpath->query('//p'); // perfect, now back simple to express XPath!

只有在没有限制的情况下才能使用此解决方案，例如在is read-only上下文中。

其他实际环境中的问题是将完整XML保存/重新加载为字符串的高成本（CPU），以及安全但更昂贵的digital preservation方法，它为正则表达式准备安全的XML。

使用C14N（也适用于数字保存上下文中的其他内容）对于确保正则表达式的正确行为是必要的 - 严格来说getAttribute()方法可能会受到一个属性重复，但我们可以忽略这个＆＃34;二阶＆＃34;效果，或将检查转移到正则表达式。

我的DOMDocument中的幽灵？

2 个答案:

可靠的解决方法