如何在DOMDocument中只获取节点的头尾文本?
示例,在此示例代码中,我不希望看到标记的内容:
$dom = new DOMDocument();
$dom->loadXML('<?xml version="1.0" encoding="UTF-8"?>
<root>
<test>head text <b>some bold text</b> tail text</test>
</root>
');
foreach ($dom->getElementsByTagName('test') as $node) {
echo 'nodeValue: '.$node->nodeValue."\n";
echo 'textContent:'.$node->textContent."\n";
}
答案 0 :(得分:1)
您必须遍历每个节点并只查找文本(DOMText)的子节点,可以忽略任何其他节点...
$dom = new DOMDocument();
$dom->loadXML('<?xml version="1.0" encoding="UTF-8"?>
<root>
<test>head text <b>some bold text</b> tail text</test>
</root>
');
foreach ($dom->getElementsByTagName('test') as $node) {
foreach ( $node->childNodes as $sub ) {
if ( $sub instanceof DOMText ) {
echo 'nodeValue: '.$sub->nodeValue."\n";
echo 'textContent:'.$sub->textContent."\n";
}
}
}
给你......
nodeValue: head text
textContent:head text
nodeValue: tail text
textContent: tail text
答案 1 :(得分:0)
作为替代方案,您还可以使用DOMXPath
和xpath表达式来获取文本:
$dom = new DOMDocument();
$dom->loadXML('<?xml version="1.0" encoding="UTF-8"?>
<root>
<test>head text <b>some bold text</b> tail text</test>
</root>
');
$xpath = new DOMXPath($dom);
$elements = $xpath->query('/root/test/text()');
foreach ($elements as $element) {
echo $element->nodeValue;
}