Question

我尝试解析XML文档（odt文件的content.xml）。

$reader = new XMLReader();
if (!$reader->open("content.xml")) die("Failed to open 'content.xml'");
    // step through text:h and text:p elements to put them into an array
    while ($reader->read()){ 
        if ($reader->nodeType == XMLREADER::ELEMENT && ($reader->name === 'text:h' || $reader->name === 'text:p')) {  
            echo $reader->expand()->textContent; // Put the text into array in correct order...
        }
    }
$reader->close();

首先，我需要一点提示如何正确地执行XML文件的元素。在我的尝试中，我可以单步执行文本：h-elements，但如何获取其他元素（text：p），而不会弄乱所有内容......

然而，我会告诉你我的最终目标。请不要认为我要求一个完整的解决方案。我只是写了一切，以显示我需要的结构。我想逐步解决这个问题

此xml文件的内容类似于：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
[...]
<office:body>
    <office:text text:use-soft-page-breaks="true">
        <text:h text:style-name="P1" text:outline-level="2">Chapter 1</text:h>
            <text:p text:style-name="Standard">Lorem ipsum. </text:p>

            <text:h text:style-name="Heading3" text:outline-level="3">Subtitle 1</text:h>
                <text:p text:style-name="Standard"><text:span text:style-name="T2">Something 1:</text:span> Lorem.</text:p>
                <text:p text:style-name="Standard"><text:span text:style-name="T3">Something 2:</text:span><text:s/>Lorem ipsum.</text:p>
                <text:p text:style-name="Standard"><text:span text:style-name="T4">Something 3:</text:span> Lorem ipsum.</text:p>

            <text:h text:style-name="Heading3" text:outline-level="3">Subtitle 2</text:h>
                <text:p text:style-name="Standard"><text:span text:style-name="T5">10</text:span><text:span text:style-name="T6">:</text:span><text:s/>Text (100%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                <text:p text:style-name="Standard">9.7:<text:s/>Text (97%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                <text:p text:style-name="Standard"><text:span text:style-name="T9">9.1:</text:span><text:s/>Text (91%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                    <text:p text:style-name="Explanation">More furter informations.</text:p>

            [Subtitle 3 and 4]

            <text:h text:style-name="Heading3" text:outline-level="3">Subtitle 5</text:h>
                <text:p text:style-name="Standard"><text:span text:style-name="T5">10</text:span><text:span text:style-name="T6">:</text:span><text:s/>Text (100%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                <text:p text:style-name="Standard">9.7:<text:s/>Text (97%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                <text:p text:style-name="Standard"><text:span text:style-name="T9">9.1:</text:span><text:s/>Text (91%)</text:p>
                    <text:p text:style-name="Explanation">Further informations.</text:p>
                    <text:p text:style-name="Explanation">More furter informations.</text:p>

            <text:h text:style-name="Heading3" text:outline-level="3">References</text:h>
                <text:list text:style-name="LFO44" text:continue-numbering="true">
                    <text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
                    <text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
                    <text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
                    <text:list-item><text:p text:style-name="P25">blabla et al., Any Title p. 580-586</text:p></text:list-item>
                </text:list>

        [Multiple Chapter like this]

    </office:text>
</office:body>

你看，“subchapters”总是有标准元素和可选的explain-element（也可以是一个标准的多个解释元素）。这个结构总是一样的......

我的最终目标是拆分所有信息以获得像这样的数组输出：

array() {
  [1]=>
  array() {
    ["chapter"]=>
    string() "Chapter 1"
    ["content"]=>
    array() {
      [0]=>
      array() {
        ["subchapter"]=>
        string() "Description"
        ["content"]=>
        array() {
          [0]=>
          array() {
            ["standard"]=>
            string() "Lorem ipsum."
            ["explanation"]=>
            string(0) ""
          }
        }
      }
      [1]=>
      array() {
        ["subchapter"]=>
        string() "Subtitle 1"
        ["content"]=>
        array() {
          [0]=>
          array() {
            ["standard"]=>
            string() "Something 1: Lorem."
            ["explanation"]=>
            string() ""
          }
          [1]=>
          array() {
            ["standard"]=>
            string() "Something 2: Lorem ipsum."
            ["explanation"]=>
            string() ""
          }
          [2]=>
          array() {
            ["standard"]=>
            string() "Something 2: Lorem ipsum."
            ["explanation"]=>
            string() ""
          }          
        }
      }
      [2]=>
      array() {
        ["subchapter"]=>
        string() "Subtitle 2"
        ["content"]=>
        array() {
          [0]=>
          array() {
            ["standard"]=>
            string() "10: Text (100%)"
            ["explanation"]=>
            string() "Further informations."
          }
    [and so on]

Answer 1

编辑：

我现在可以看到你的问题，感谢您编辑问题：

在你的while循环中

while ($reader->read()){ 

}

您可以使用几个函数来获取节点和值：

$reader->value

将给出值（例如＆＃39;字幕1＆＃39;）

$reader->getAttribute('text:style-name')

应该得到＆＃39; Heading3＆＃39;部分

总而言之，你可能在while循环[pseudocode]中想要这样的东西：

 // set an index
 $i = 0;
 // get the parts fromt he xml we need
 $name = $reader->name;
 $attrib = $reader->getAttribute('text:style-name');
 $value = $reader->value;

 // if the attribute is a 'P1', then increment our index, as we need a new indentation in our array
 if($value == 'P1'){
     $i++;
 }

 $array[$i][$attrib]=$reader->value;

请注意，这只会缩进到一个级别 - 看起来你需要4级，所以你应该有4个索引[$ i，$ k，$ k，$ l]并检查每个级别需要缩进 - P1，Heading3等

你最终可能

$array[$i][$j][$k] = $reader->value;

等。当你输入更高的索引时，请记住重新设置所有子索引（例如，如果你是$ i ++，设置$ j = 0，$ k = 0等）

以下答案：

SimpleXML可以（可能）在几行中执行此操作[如果xml文件的结构已经以正确的方式嵌套，经过快速查看后，它似乎是]：http://php.net/manual/en/book.simplexml.php

$xml = simplexml_load_file('content.xml');
$json = json_encode($xml);
$array = json_decode($json,TRUE);

print_r($array);

编辑：你也可以使用xpath和simplexml，你可以做像

这样的事情

echo $xml->{office:body}->{office:text}->{text.h}

解析XML文档（odt-file）：如何逐步遍历元素以填充数组

1 个答案: