Xpath奇怪的行为 - 不匹配文本节点

时间:2016-08-20 14:21:15

标签: php xml xpath

做这样的事情

foreach ($xpath->query('.//tpl-static', $domTemplateContainer) as $domStatic) {
    /* ... */
    $domStatic->parentNode->removeChild($domStatic);
}

一切似乎都很好。

但是在处理xml-comments时更重要的是 - 文本节点无法按预期工作:

foreach ($xpath->query('.//text()[normalize-space() = ""]', $domDocumentFragment) as $domNode) {
    $domNode->parentNode->removeChild($domNode);
}

有些文本节点没有被选中,但有些是。我无法找到这背后的逻辑。谓词并不重要。 但我也发现以下查询工作正常:./descendant-or-self::text()[normalize-space() = ""]

为什么.//仅适用于元素节点而不适用于文本节点?它是libxml / php的bug还是要报告的东西或者我错过了什么?

此外:

完整示例(改编自复杂项目):

$xml = '
<tpl-static>
    <link rel="shortcut icon" type="image/x-icon" href="/static/images/icon.ico" />
    <link rel="stylesheet" type="text/css" href="/static/css/html5reset-1.6.1.css" />
    <link rel="stylesheet" type="text/css" href="/static/css/style.css" />
    <script src="/static/js/underscore.js"></script>
    <!-- <script src="/static/js/jquery.adaptive-backgrounds.js"></script> -->
    <script src="/static/js/jquery.maskedinput.min.js"></script>
    <link href="/static/js/jquery-ui-1.11.2.custom/jquery-ui.css" rel="stylesheet"/>
    <script src="/static/js/jquery-ui-1.11.2.custom/jquery-ui.min.js"></script>
    <link rel="stylesheet" href="/static/js/jquery.magnific-popup/magnific-popup.css" />
    <script src="/static/js/jquery.magnific-popup/jquery.magnific-popup.js"></script>
    <script src="/static/templates/dealers-page-includes/page-includes.js"></script>
</tpl-static>
<br/>

';

$domDocument = new \DOMDocument('1.0', 'utf-8');
$xpath = new \DOMXPath($domDocument);
$domDocumentFragment = $domDocument->createDocumentFragment();
$domDocumentFragment->appendXml($xml);

$templateName = 'test';
//$it = $this;
$adoptTemplate = function($domTemplateContainer) use (&$adoptTemplate, /*$it,*/ $domDocument, $xpath, $templateName) {

    foreach ($xpath->query('.//comment()', $domTemplateContainer) as $domComment) {
        $domComment->parentNode->removeChild($domComment);
    }

    foreach ($xpath->query('.//tpl-static', $domTemplateContainer) as $domStatic) {
        foreach ($domStatic->childNodes as $curChildNode) {
            //$it->_domDocumentHead->appendChild($curChildNode->cloneNode(true));
        }
        $domStatic->parentNode->removeChild($domStatic);
    }
};

$adoptTemplate($domDocumentFragment);

// FAIL!
/*foreach ($xpath->query('.//text()[normalize-space() = ""]', $domDocumentFragment) as $domNode) {
    $domNode->parentNode->removeChild($domNode);
}*/
// HERE IS 
// workaround...
foreach ($xpath->query('./descendant-or-self::text()[normalize-space() = ""]', $domDocumentFragment) as $domNode) {
    $domNode->parentNode->removeChild($domNode);
}

if ($domDocumentFragment->childNodes->length > 1) {
    throw new \Exception('Single node expected in template "' . $templateName . '", ' . $domDocumentFragment->childNodes->length . ' given.');
}

1 个答案:

答案 0 :(得分:1)

我剥离了你的代码来测试不同的表达式。

$xml = '
<tpl-static>
    <link rel="shortcut icon" type="image/x-icon" href="/static/images/icon.ico" />
    <link rel="stylesheet" type="text/css" href="/static/css/html5reset-1.6.1.css" />
</tpl-static>
<br/>
';

$domDocument = new \DOMDocument('1.0', 'utf-8');
$xpath = new \DOMXPath($domDocument);
$domDocumentFragment = $domDocument->createDocumentFragment();
$domDocumentFragment->appendXml($xml);

$expressions = [
  './/text()[normalize-space() = ""]',
  './*/text()[normalize-space() = ""]',
  './descendant-or-self::text()[normalize-space() = ""]',
  './*/descendant-or-self::text()[normalize-space() = ""]'
];

foreach ($expressions as $expression) {
  $nodes = $xpath->evaluate($expression, $domDocumentFragment);
  var_dump($expression, $nodes->length);
}

输出:

string(33) ".//text()[normalize-space() = ""]"
int(3)
string(34) "./*/text()[normalize-space() = ""]"
int(3)
string(52) "./descendant-or-self::text()[normalize-space() = ""]"
int(6)
string(54) "./*/descendant-or-self::text()[normalize-space() = ""]"
int(3)

正如您所看到的,前两个表达式返回相同的节点数,而第三个(您的解决方法)返回一个更大的数字。看起来第一个表达式不包括片段的直接文本子节点。

我修改了源代码以包含一个可以用作表达式上下文的顶级元素。

$xml = '<foo>
<tpl-static>
    <link rel="shortcut icon" type="image/x-icon" href="/static/images/icon.ico" />
    <link rel="stylesheet" type="text/css" href="/static/css/html5reset-1.6.1.css" />
</tpl-static>
<br/>
</foo>';

$domDocument = new \DOMDocument('1.0', 'utf-8');
$xpath = new \DOMXPath($domDocument);
$domDocumentFragment = $domDocument->createDocumentFragment();
$domDocumentFragment->appendXml($xml);
$context = $domDocumentFragment->firstChild;

$expressions = [
  './/text()[normalize-space() = ""]',
  './*/text()[normalize-space() = ""]',
  './descendant-or-self::text()[normalize-space() = ""]',
  './*/descendant-or-self::text()[normalize-space() = ""]'
];

foreach ($expressions as $expression) {
  $nodes = $xpath->evaluate($expression, $context);
  var_dump($expression, $nodes->length);
}

输出:

string(33) ".//text()[normalize-space() = ""]"
int(6)
string(34) "./*/text()[normalize-space() = ""]"
int(3)
string(52) "./descendant-or-self::text()[normalize-space() = ""]"
int(6)
string(54) "./*/descendant-or-self::text()[normalize-space() = ""]"
int(3)

返回预期结果。现在,第一个表达式包括上下文的直接子节点。

如果上下文节点是文档片段,则.//text()的解释看起来不同。

您可能认为这是一个错误,但根据W3C规范a fragment is not a valid context来表示Xpath表达式。

  

如果XPathEvaluator是通过强制转换Document获得的,那么它必须由同一文档拥有,并且必须是Document,Element,Attribute,Text,CDATASection,Comment,ProcessingInstruction或XPathNamespace节点。

因此,为了使您的源符合规范,您必须迭代子节点并评估每个节点的表达式。在这种情况下,descendant-or-self::text()将以确定的方式工作。