如果提供了未缩进的XML字符串,则解析不正确

时间:2016-10-05 13:31:11

标签: php xml xml-parsing

我试图通过在关联数组中获取节点及其属性的值来解析XML。在下面的课程中convert_simple_xml_element_object_into_array是为了完成这项工作。

但是发生了一件奇怪的事情。如果提供的输入是正确的缩进xml,则返回的关联数组是正确的。但是,如果传递非缩进xml字符串,则返回带有空索引的错误关联数组。可能是什么原因?

示例xml字符串:

<?xml version="1.0"?>
<StreamWebInfo><UserInfo Username="a@b.com" AccountId="19"/><JobInfo Id="594" QualifiedFilePath="https://s.com/main_DIS_23009_1_v_2_1c2_2011_08_30.mpd" ParentContainerType="0" ContainerType="10" EndTime="2016-10-05 11:45:09"/><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="320" Height="240" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="320" Height="240" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="320" Height="240" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="864" Height="480" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="864" Height="480" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo></StreamWebInfo>

具有以下方法的类:

<?php

class _xml_parser {

    const EMPTY_STRING = '';
    const MAX_RECURSION_DEPTH_ALLOWED =  200;
    const SIMPLE_XML_ELEMENT_OBJECT_PROPERTY_FOR_ATTRIBUTES = '@attributes';


    /**
     * Get the SimpleXMLElement representation of the function input 
     * parameter that contains XML string. Convert the XML string 
     * contents to SimpleXMLElement type. SimpleXMLElement type is 
     * nothing but an object that can be processed with normal property 
     * selectors and (associative) array iterators.
     * 
     * @param string $xmlStringContents
     * @return SimpleXMLElement get_simple_xml_element returns a SimpleXMLElement object which 
     * contains an instance variable which itself is an associative array of 
     * several SimpleXMLElement objects.
     * 
     *  
     * @version 1.0.0
     */
    public static function get_simple_xml_element($xmlStringContents) {
        $simpleXmlElementObject = self::EMPTY_STRING;
        if('string' == gettype($xmlStringContents)) {
            $simpleXmlElementObject = simplexml_load_string($xmlStringContents);
        }
        return $simpleXmlElementObject; 
    }

    /**
     * This function accepts a SimpleXmlElementObject as a single argument and
     * converts the XML object into a PHP associative array. 
     * If the input XML is in tree (i.e. nested) format, this function will return an associative  
     * array (tree/nested) representation of that XML.
     * 
     * Note: It is a recursive a function
     * 
     * @param string $simpleXmlElementObject
     * @param number $recursionDepth
     * 
     * @return If everything is successful, it returns an associate array containing 
     *  the data collected from the XML format. Otherwise, it returns null.
     *
     * 
     */
    public static function convert_simple_xml_element_object_into_array($simpleXmlElementObject, &$recursionDepth=0) {
        // Keep an eye on how deeply we are involved in recursion.
        if ($recursionDepth > self::MAX_RECURSION_DEPTH_ALLOWED) {
            // Fatal error. Exit now.
            return(null);
        }

        if ($recursionDepth == 0) {
            if (!($simpleXmlElementObject instanceof SimpleXMLElement)) {
                // If the external caller doesn't call this function initially
                // with a SimpleXMLElement object, return now.
                return(null);
            } else {
                // Store the original SimpleXmlElementObject sent by the caller.
                // We will need it at the very end when we return from here.
                $callerProvidedSimpleXmlElementObject = $simpleXmlElementObject;
            }
        }   

        if ($simpleXmlElementObject instanceof SimpleXMLElement) {
            // Get a copy of the simpleXmlElementObject
            $copyOfsimpleXmlElementObject = $simpleXmlElementObject;
            // Get the object variables in the SimpleXmlElement object for us to iterate.
            $simpleXmlElementObject = get_object_vars($simpleXmlElementObject);
        }

        // It needs to be an array of object variables.
        if (is_array($simpleXmlElementObject)) {
            // Initialize the result array.
            $resultArray = array();
            // Is the input array size 0? Then, we reached the rare CDATA text if any.
            if (count($simpleXmlElementObject) <= 0) {
                // Let us return the lonely CDATA. It could even be whitespaces.
                return (trim(strval($copyOfsimpleXmlElementObject)));
            }

            // Let us walk through the child elements now.
            foreach($simpleXmlElementObject as $key=>$value) {
                // Uncomment the following block of code if XML attributes are
                // NOT required to be returned as part of the result array.
                /*
                 if((is_string($key)) && ($key == self::SIMPLE_XML_ELEMENT_OBJECT_PROPERTY_FOR_ATTRIBUTES)) {
                    continue;
                 }
                 */
                // Let us recursively process the current element we just visited.
                // Increase the recursion depth by one.
                $recursionDepth++;
                $resultArray[$key] = self::convert_simple_xml_element_object_into_array($value, $recursionDepth);
                // Decrease the recursion depth by one.
                $recursionDepth--;
            } 

            if ($recursionDepth == 0) {
                // That is it. Heading to the exit now.
                // Set the XML root element name as the root [top-level] key of
                // the associative array that we are going to return to the caller of this
                // recursive function.
                $tempArray = $resultArray;
                $resultArray = array();
                $resultArray[$callerProvidedSimpleXmlElementObject->getName()] = $tempArray;
            }

            return ($resultArray);
        } else {
            // We are now looking at either the XML attribute text or
            // the text between the XML tags.
            return (trim(strval($simpleXmlElementObject)));
        } // End of else
    }

    /**
     * Converts XML to JSON
     * @param SimpleXMLElement $simpleXmlElementObject
     * @return JSON string
     *  
     */
    public static function xml2json($simpleXmlElementObject) {
        $json_from_xml = null;
        if($simpleXmlElementObject instanceof SimpleXMLElement) {
            $xml_map = self::convert_simple_xml_element_object_into_array($simpleXmlElementObject);
            $json_from_xml = json_encode($xmlMap);
        }
        return $json_from_xml;
    }

}

在上面的xml中,返回的数组有一个名为ProfileInfo的键,但它包含一个具有空键值对的映射。

1 个答案:

答案 0 :(得分:0)

convert_simple_xml_element_object_into_array 函数中,您必须检查没有属性的SimpleXMLElement对象是否具有子级。 如果是这样,对于每个孩子,您将不得不再次呼叫 convert_simple_xml_element_object_into_array

使用NEW代码替换旧代码应返回正确的数组:

旧代码:

// Is the input array size 0? Then, we reached the rare CDATA text if any.
if (count($simpleXmlElementObject) <= 0) {
  // Let us return the lonely CDATA. It could even be whitespaces.
  return (trim(strval($copyOfsimpleXmlElementObject)));
}

新代码:

// Is the input array size 0? Then, we reached the rare CDATA text if any.
if (count($simpleXmlElementObject) <= 0) {
  //Check if the Object have children. If so, call again the function
  if(($copyOfsimpleXmlElementObject instanceof SimpleXMLElement) && (count($copyOfsimpleXmlElementObject->children()) >=1)) {
    foreach($copyOfsimpleXmlElementObject->children() as $child){
      $recursionDepth++;
      $resultArray[$child->getName()] = self::convert_simple_xml_element_object_into_array($child, $recursionDepth);
      $recursionDetph--;
    }                       
  }
  else{
    // Let us return the lonely CDATA. It could even be whitespaces.
    return (trim(strval($copyOfsimpleXmlElementObject)));
  }
}