我有一些第三方XML,看起来像这样:
<body>
<text>Unimportant Introduction</text>
<text class="heading">Important Section 1</text>
<text>Important text</text>
<table>(Table data)</table>
<text>Other important text</text>
<text class="heading">Important Section 2</text>
<text class="heading"></text>
<text>Important text</text>
<text>Other important text</text>
<text class="heading">Important Section 3</text>
<text>Important text</text>
<table>(Table data)</table>
</body>
我想要的是从非空<text class="heading">
开始的所有节点,但在另一个非空<text class="heading">
之前停止。最后一个<text class="heading">
捕获<body>
中剩余的节点非常重要,所以这样的事情(不一定要精确):
array(
0 => DOMNodeList {
<text class="heading">Important Section 1</text>
<text>Important text</text>
<table>(Table data)</table>
<text>Other important text</text>
},
1 => DOMNodeList {
<text class="heading">Important Section 2</text>
<text class="heading"></text>
<text>Important text</text>
<text>Other important text</text>
},
2 => DOMNodeList {
<text class="heading">Important Section 3</text>
<text>Important text</text>
<table>(Table data)</table>
}
)
如果我不能在一个XPath中执行此操作(分离和分组子项),那么循环也可以。
我已经可以找到<text class="heading">
个//body/text[@class=\'heading\' and string-length(text()) > 0]
节点,但我不知道如何添加所有兄弟节点。
编辑:
我刚才意识到我真正想要的更像是这样:
array(
0 => DOMElement {
<body>
<text class="heading">Important Section 1</text>
<text>Important text</text>
<table>(Table data)</table>
<text>Other important text</text>
</body>
},
1 => DOMElement {
<body>
<text class="heading">Important Section 2</text>
<text class="heading"></text>
<text>Important text</text>
<text>Other important text</text>
</body>
},
2 => DOMElement {
<body>
<text class="heading">Important Section 3</text>
<text>Important text</text>
<table>(Table data)</table>
</body>
}
)
在<body>
节点内拥有所有必需的节点非常有用!
答案 0 :(得分:0)
以下代码在循环中执行我想要的操作:
<?php
$xml = <<<EOT
<body>
<text>Unimportant Introduction</text>
<text class="heading">Important Section 1</text>
<text>Important text 1</text>
<table>(Table data) 1</table>
<text>Other important text 1</text>
<text class="heading">Important Section 2</text>
<text class="heading"></text>
<text>Important text 2</text>
<text>Other important text 2</text>
<text class="heading">Important Section 3</text>
<text>Important text 3</text>
<table>(Table data) 3</table>
</body>
EOT;
$dom = new DOMDocument();
$dom->loadXML($xml);
$finder = new DOMXPath($dom);
$heading = "text[@class='heading' and string-length(text()) > 0]";
$nodes = $finder->query("//body/{$heading}");
$num_sections = $nodes->length;
for ($num = 1; $num <= $num_sections; ++$num) {
// Find all nodes that match the nth heading or any nodes after the nth heading
// (nth heading plus all following nodes)
$node_set1 = "(//body/{$heading}[{$num}] | //body/{$heading}[{$num}]/following-sibling::*)";
// Find the next heading after the nth heading (if it exists) or any nodes after that
// (n+1-th heading plus all following nodes)
$node_set2 = "//body/{$heading}[{$num}]/following-sibling::{$heading}[1] | //body/{$heading}[{$num}]/following-sibling::{$heading}[1]/following-sibling::*";
// Find all nodes that are in the first set but not in the second set
$nodes = $finder->query("{$node_set1}[count(.| {$node_set2})!=count({$node_set2})]");
print("Section $num:<br/>\n");
foreach ($nodes as $node) {
$sx = simplexml_import_dom($node);
var_dump($sx->asXML());
}
}
?>
输出(使用Xdebug):
Section 1:
string '<text class="heading">Important Section 1</text>' (length=48)
string '<text>Important text 1</text>' (length=29)
string '<table>(Table data) 1</table>' (length=29)
string '<text>Other important text 1</text>' (length=35)
Section 2:
string '<text class="heading">Important Section 2</text>' (length=48)
string '<text class="heading"/>' (length=23)
string '<text>Important text 2</text>' (length=29)
string '<text>Other important text 2</text>' (length=35)
Section 3:
string '<text class="heading">Important Section 3</text>' (length=48)
string '<text>Important text 3</text>' (length=29)
string '<table>(Table data) 3</table>' (length=29)
我不知道这是否是最简单的解决方案,但它可以解决问题!