Question

这是我尝试解析的xml文件（odt-file）的结构：

<office:body>
    <office:text>
        <text:h text:style-name="P1" text:outline-level="2">Chapter 1</text:h>
            <text:p text:style-name="Standard">Lorem ipsum. </text:p>

            <text:h text:style-name="Heading3" text:outline-level="3">Subtitle 2</text:h>
                <text:p text:style-name="Standard"><text:span text:style-name="T5">10</text:span><text:span text:style-name="T6">:</text:span><text:s/>Text (100%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                <text:p text:style-name="Standard">9.7:<text:s/>Text (97%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                <text:p text:style-name="Standard"><text:span text:style-name="T9">9.1:</text:span><text:s/>Text (91%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                    <text:p text:style-name="Explanation">More furter informations.</text:p>
    </office:text>
</office:body>

使用XML-Reader我这样做：

while ($reader->read()){ 
    if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:h') { 
        if ($reader->getAttribute('text:outline-level')=="2") $html .= '<h2>'.$reader->expand()->textContent.'</h2>';
    }
    elseif ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:p') { 
        if ($reader->getAttribute('text:style-name')=="Standard") {
            $html .= '<p>'.$reader->readInnerXML().'<p>';
        }
        else if {
            // Doing something different
        }
    }
}
echo $html;

现在我想用DOMDocument做同样的事情，但我需要一些语法方面的帮助。我怎样才能遍历办公室的所有孩子：文字？在循环遍历所有节点时，我会检查if / else要做什么（text：h vs. text：p）。

我还需要用空格替换每个文本：s（如果文本中有这样的元素：p）......

$reader = new DOMDocument();
$reader->preserveWhiteSpace  = false;
$reader->load('zip://content.odt#content.xml');

$body = $reader->getElementsByTagName( 'office:text' )->item( 0 );
foreach( $body->childNodes as $node ) echo $node->nodeName . PHP_EOL;

或者循环遍历所有文本元素会更聪明吗？如果是这种情况，仍然是问题，如何做到这一点。

$elements = $reader->getElementsByTagName('text');
foreach($elements as $node){
    foreach($node->childNodes as $child) {
        echo $child->nodeName.': ';
        echo $child->nodeValue.'<br>';
        // check for type...
    }
}

Answer 1

使用 DOMDocument 执行此操作的最简单方法之一是DOMXPath的帮助。

从字面上理解你的问题：

我怎样才能遍历办公室的所有孩子：文字？

这可以表示为XPath expression：

//office:text/child::node()

但是你在这里使用了一些错误的措辞。不仅是所有的孩子，还有孩子的孩子等等 - 这都是后代：

//office:text/descendant::node()

或者使用缩写语法：

//office:text//node()

与：XPath to Get All ChildNodes and not the Parent Node

比较

要在PHP中循环，您需要注册office前缀的命名空间，然后使用foreach循环遍历xpath结果： $ xpath = new DOMXPath（$ reader）; $ xpath-＆gt; registerNamespace（'office'，$ xml_namespace_uri_of_office_namespace）;

$descendants = $xpath->query('//office:text//node()');
foreach ($descendants as $node) {
    // $node is a DOMNode as of DOMElement, DOMText, ...
}

XPath不是一般但在PHP的基于libxml的库中确实以文档顺序返回节点。这就是你要找的订单。

与：XPath query result order

比较

使用domdocument循环遍历元素的所有子元素并提取文本内容

1 个答案: