如何将XML字符串转换为具有不同结构的PHP数组?

时间:2018-05-02 14:09:44

标签: php arrays xml multidimensional-array domdocument

我有这种方法将XML字符串转换为具有不同键和值的PHP数组,以便充分理解该XML。但是,当有多个相同类型的子节点时,我没有从数组中获得所需的结果,我对如何改变方法感到困惑。

这就是方法的样子:

/**
 * Converts a XML string to an array
 *
 * @param $xmlString
 * @return array
 */
private function parseXml($xmlString)
{
    $doc = new DOMDocument;
    $doc->loadXML($xmlString);
    $root = $doc->documentElement;
    $output[$root->tagName] = $this->domnodeToArray($root, $doc);

    return $output;
}

/**
 * @param $node
 * @param $xmlDocument
 * @return array|string
 */
private function domNodeToArray($node, $xmlDocument)
{
    $output = [];
    switch ($node->nodeType)
    {
        case XML_CDATA_SECTION_NODE:
        case XML_TEXT_NODE:
            $output = trim($node->textContent);
            break;
        case XML_ELEMENT_NODE:
            for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++)
            {
                $child = $node->childNodes->item($i);
                $v = $this->domNodeToArray($child, $xmlDocument);

                if (isset($child->tagName))
                {
                    $t = $child->tagName;

                    if (!isset($output['value'][$t]))
                    {
                        $output['value'][$t] = [];
                    }
                    $output['value'][$t][] = $v;
                }
                else if ($v || $v === '0')
                {
                    $output['value'] = htmlspecialchars((string)$v, ENT_XML1 | ENT_COMPAT, 'UTF-8');
                }
            }

            if (isset($output['value']) && $node->attributes->length && !is_array($output['value']))
            {
                $output = ['value' => $output['value']];
            }

            if (!$node->attributes->length && isset($output['value']) && !is_array($output['value']))
            {
                $output = ['attributes' => [], 'value' => $output['value']];
            }

            if ($node->attributes->length)
            {
                $a = [];
                foreach ($node->attributes as $attrName => $attrNode)
                {
                    $a[$attrName] = (string)$attrNode->value;
                }
                $output['attributes'] = $a;
            }
            else
            {
                $output['attributes'] = [];
            }

            if (isset($output['value']) && is_array($output['value']))
            {
                foreach ($output['value'] as $t => $v)
                {
                    if (is_array($v) && count($v) == 1 && $t != 'attributes')
                    {
                        $output['value'][$t] = $v[0];
                    }
                }
            }
            break;
    }

    return $output;
}

以下是一些示例XML:

<?xml version="1.0" encoding="UTF-8"?>
<characters>
   <character>
      <name2>Sno</name2>
      <friend-of>Pep</friend-of>
      <since>1950-10-04</since>
      <qualification>extroverted beagle</qualification>
   </character>
   <character>
      <name2>Pep</name2>
      <friend-of>Sno</friend-of>
      <since>1966-08-22</since>
      <qualification>bold, brash and tomboyish</qualification>
   </character>
</characters>

运行该方法并将该XML作为其参数传递将产生此数组:

array:1 [▼
  "characters" => array:2 [▼
    "value" => array:1 [▼
      "character" => array:2 [▼
        0 => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "attributes" => []
              "value" => "Sno"
            ]
            "friend-of" => array:2 [▼
              "attributes" => []
              "value" => "Pep"
            ]
            "since" => array:2 [▼
              "attributes" => []
              "value" => "1950-10-04"
            ]
            "qualification" => array:2 [▼
              "attributes" => []
              "value" => "extroverted beagle"
            ]
          ]
          "attributes" => []
        ]
        1 => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "attributes" => []
              "value" => "Pep"
            ]
            "friend-of" => array:2 [▼
              "attributes" => []
              "value" => "Sno"
            ]
            "since" => array:2 [▼
              "attributes" => []
              "value" => "1966-08-22"
            ]
            "qualification" => array:2 [▼
              "attributes" => []
              "value" => "bold, brash and tomboyish"
            ]
          ]
          "attributes" => []
        ]
      ]
    ]
    "attributes" => []
  ]
]

我想要的结果是(缩进可能是错误的):

array:1 [▼
  "characters" => array:2 [▼
    "value" => array:2 [▼
      0 => [
        "character" => array:1 [▼
            "value" => array:4 [▼
              "name2" => array:2 [▼
                  "attributes" => []
                  "value" => "Sno"
                ]
                "friend-of" => array:2 [▼
                  "attributes" => []
                  "value" => "Pep"
                ]
                "since" => array:2 [▼
                  "attributes" => []
                  "value" => "1950-10-04"
                ]
                "qualification" => array:2 [▼
                  "attributes" => []
                  "value" => "extroverted beagle"
                ]
              ]
              "attributes" => []
            ]
          ]
        ]
        1 => array:2 [▼
          "character" => array:1 [▼
            "value" => array:4 [▼
              "name2" => array:2 [▼
                "attributes" => []
                "value" => "Pep"
              ]
              "friend-of" => array:2 [▼
                "attributes" => []
                "value" => "Sno"
              ]
              "since" => array:2 [▼
                "attributes" => []
                "value" => "1966-08-22"
              ]
              "qualification" => array:2 [▼
                "attributes" => []
                "value" => "bold, brash and tomboyish"
              ]
            ]
            "attributes" => []
          ]
        ]
      ]
    ]
    "attributes" => []
  ]
]

基本上,我希望characters键的value键是两个项目的数组,基本上包括2个character键。只有在同一分支上有许多相同的元素时才会发生这种情况。目前的方式,character键是一个包含2个元素的数组,在我的情况下不起作用。

改变上述方法以反映我的需求对我来说还不可能,我不确定我应采取什么样的方法。从DOMDocument实例改变这样的数组似乎相当复杂。

2 个答案:

答案 0 :(得分:1)

我已对您的功能进行了一些更改,但我不确定这是否是您所需要的。

private function domNodeToArray($node, $xmlDocument)
{
    $output = ['value' => [], 'attributes' => []];

    switch ($node->nodeType) {
    case XML_CDATA_SECTION_NODE:
    case XML_TEXT_NODE:
        $output = trim($node->textContent);
        break;
    case XML_ELEMENT_NODE:
        for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++) {
            $child = $node->childNodes->item($i);
            $v = $this->domNodeToArray($child, $xmlDocument);

            if (isset($child->tagName)) {
                $t = $child->tagName;

                if (isset($output['value'][$t])) {
                    $output['value'][] = [$t => $output['value'][$t]];
                    $output['value'][] = [$t => $v];
                    unset($output['value'][$t]);
                } else {
                    $output['value'][$t] = $v;
                }
            } elseif (($v && is_string($v)) || $v === '0') {
                $output['value'] = htmlspecialchars((string)$v, ENT_XML1 | ENT_COMPAT, 'UTF-8');
            }
        }

        if ($node->attributes->length) {
            foreach ($node->attributes as $attrName => $attrNode) {
                $output['attributes'][$attrName] = (string) $attrNode->value;
            }
        }

        break;
    }

    return $output;
}

输出

array:1 [▼
  "characters" => array:2 [▼
    "value" => array:2 [▼
      0 => array:1 [▼
        "character" => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "value" => "Sno"
              "attributes" => []
            ]
            "friend-of" => array:2 [▼
              "value" => "Pep"
              "attributes" => []
            ]
            "since" => array:2 [▼
              "value" => "1950-10-04"
              "attributes" => []
            ]
            "qualification" => array:2 [▼
              "value" => "extroverted beagle"
              "attributes" => []
            ]
          ]
          "attributes" => []
        ]
      ]
      1 => array:1 [▼
        "character" => array:2 [▼
          "value" => array:4 [▼
            "name2" => array:2 [▼
              "value" => "Pep"
              "attributes" => []
            ]
            "friend-of" => array:2 [▼
              "value" => "Sno"
              "attributes" => []
            ]
            "since" => array:2 [▼
              "value" => "1966-08-22"
              "attributes" => []
            ]
            "qualification" => array:2 [▼
              "value" => "bold, brash and tomboyish"
              "attributes" => []
            ]
          ]
          "attributes" => []
        ]
      ]
    ]
    "attributes" => []
  ]
]

答案 1 :(得分:1)

问题是何时添加新级别以及何时继续添加数据。我已经改变了这个逻辑,在代码中添加了注释,以帮助理解发生了什么以及什么时候......

private function domNodeToArray($node, $xmlDocument)
{
    $output = [];
    switch ($node->nodeType)
    {
        case XML_CDATA_SECTION_NODE:
        case XML_TEXT_NODE:
            $output = trim($node->textContent);
            break;
        case XML_ELEMENT_NODE:
            for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++)
            {
                $child = $node->childNodes->item($i);
                $v = $this->domNodeToArray($child, $xmlDocument);

                if (isset($child->tagName))
                {
                    $t = $child->tagName;

//                     if (!isset($output['value'][$t]))
//                     {
//                         $output['value'][$t] = [];
//                     }
                    // If the element already exists
                    if (isset($output['value'][$t]))
                    {
                        // Copy the existing value to new level
                        $output['value'][] = [$t => $output['value'][$t]];
                        // Add in new value
                        $output['value'][] = [$t => $v];
                        // Remove old element
                        unset($output['value'][$t]);
                    }
                    // If this has already been added at a new level
                    elseif ( isset($output['value'][0][$t]))   
                    {
                        // Add it to existing extra level
                        $output['value'][] = [$t => $v];
                    }
                    else    {
                        $output['value'][$t] = $v;
                    }
                }
                else if ($v || $v === '0')
                {
                    $output['value'] = htmlspecialchars((string)$v, ENT_XML1 | ENT_COMPAT, 'UTF-8');
                }
            }

            if (isset($output['value']) && $node->attributes->length && !is_array($output['value']))
            {
                $output = ['value' => $output['value']];
            }

            if (!$node->attributes->length && isset($output['value']) && !is_array($output['value']))
            {
                $output = ['attributes' => [], 'value' => $output['value']];
            }

            if ($node->attributes->length)
            {
                $a = [];
                foreach ($node->attributes as $attrName => $attrNode)
                {
                    $a[$attrName] = (string)$attrNode->value;
                }
                $output['attributes'] = $a;
            }
            else
            {
                $output['attributes'] = [];
            }
            break;
    }

    return $output;
}

我已经尝试过了......

<?xml version="1.0" encoding="UTF-8"?>
<characters>
   <character>
      <name2>Sno</name2>
      <friend-of>Pep</friend-of>
      <since>1950-10-04</since>
      <qualification>extroverted beagle</qualification>
   </character>
   <character>
      <name2>Pep</name2>
      <friend-of>Sno</friend-of>
      <since>1966-08-22</since>
      <qualification>bold, brash and tomboyish</qualification>
   </character>
   <character>
      <name2>Pep2</name2>
      <friend-of>Sno</friend-of>
      <since>1966-08-23</since>
      <qualification>boldish, brashish and tomboyish</qualification>
   </character>
</characters>

检查<character>元素是否全部添加到正确的级别。