PHP simpleXML试图处理相当复杂的文件

时间:2011-04-15 13:53:42

标签: php xml parsing crystal-reports simplexml

我在这里苦苦挣扎,今天大部分时间都在尝试解析从水晶报表导出的自动生成XML文档,XML不是很好,所以请光临我。

我必须使用的文件具有以下结构:

<?xml version="1.0" encoding="UTF-8" ?>
<FormattedReport xmlns = 'urn:crystal-reports:schemas' xmlns:xsi = 'http://www.w3.org/2000/10/XMLSchema-instance'>
    <FormattedAreaPair Level="0" Type="Report">
    <FormattedAreaPair Level="1" Type="Details">
    <FormattedArea Type="Details">
        <FormattedSections>
        <FormattedSection SectionNumber="0">
        <FormattedReportObjects>
        FormattedReportObject xsi:type="CTFormattedField" Type="xsd:long" FieldName="{tblCon.ManifestNR}"><ObjectName>ManifestNR1</ObjectName>
        <FormattedValue>1,907</FormattedValue>
        <Value>1907.00</Value>
        </FormattedReportObject>
        <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:timeInstant" FieldName="{tblCon.ShippingDate}"><ObjectName>ShippingDate1</ObjectName>
        <FormattedValue>14/04/2011</FormattedValue>
        <Value>2011-04-14T00:00:00</Value>
        </FormattedReportObject>
        ... so on and so forth ...
        </FormattedReportObjects>
        </FormattedSection>
        </FormattedSections>
        </FormattedArea>
        </FormattedAreaPair>
    <FormattedReportObjects>
        FormattedReportObject xsi:type="CTFormattedField" Type="xsd:long" FieldName="{tblCon.ManifestNR}"><ObjectName>ManifestNR1</ObjectName>
        <FormattedValue>1,907</FormattedValue>
        <Value>1907.00</Value>
        </FormattedReportObject>
        <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:timeInstant" FieldName="{tblCon.ShippingDate}"><ObjectName>ShippingDate1</ObjectName>
        <FormattedValue>14/04/2011</FormattedValue>
        <Value>2011-04-14T00:00:00</Value>
        </FormattedReportObject>
        ... so on and so forth ...
        </FormattedReportObjects>
        </FormattedSection>
        </FormattedSections>
        </FormattedArea>
        </FormattedAreaPair>
 <FormattedAreaPair Level="1" Type="Details">
    <FormattedArea Type="Details">
        <FormattedSections>
        <FormattedSection SectionNumber="0">
        <FormattedReportObjects>
        FormattedReportObject xsi:type="CTFormattedField" Type="xsd:long" FieldName="{tblCon.ManifestNR}"><ObjectName>ManifestNR1</ObjectName>
        <FormattedValue>1,907</FormattedValue>
        <Value>1907.00</Value>
        </FormattedReportObject>
        <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:timeInstant" FieldName="{tblCon.ShippingDate}"><ObjectName>ShippingDate1</ObjectName>
        <FormattedValue>14/04/2011</FormattedValue>
        <Value>2011-04-14T00:00:00</Value>
        </FormattedReportObject>
        ... so on and so forth ...
        </FormattedReportObjects>
        </FormattedSection>
        </FormattedSections>
        </FormattedArea>
        </FormattedAreaPair>
    <FormattedReportObjects>
        FormattedReportObject xsi:type="CTFormattedField" Type="xsd:long" FieldName="{tblCon.ManifestNR}"><ObjectName>ManifestNR1</ObjectName>
        <FormattedValue>1,907</FormattedValue>
        <Value>1907.00</Value>
        </FormattedReportObject>
        <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:timeInstant" FieldName="{tblCon.ShippingDate}"><ObjectName>ShippingDate1</ObjectName>
        <FormattedValue>14/04/2011</FormattedValue>
        <Value>2011-04-14T00:00:00</Value>
        </FormattedReportObject>
        ... so on and so forth ...
        </FormattedReportObjects>
        </FormattedSection>
        </FormattedSections>
        </FormattedArea>
        </FormattedAreaPair>
        </FormattedAreaPair>
        </FormattedReport>

所以我正在尝试做的是调用一个PHP函数来解析XML并最终将它存储在SQL DB中。

例如:

ManifestNR:1903 ShippingDate:12/04/2011 CarrierID:TNT03 TrackingRef:234234232 ...等每条记录......

所以我已经开始尝试使用DOM来做这个,然后偶然发现了SimpleXML,我已经阅读了几个tuts,并在这里搜索了实现,但我似乎无法访问最终节点中的数据(或任何其他数据tbh)。 simpleXML对于这种结构是否禁止?

我使用的最新PHP是:

<?php

if (file_exists('tracking.xml')) {
    $xml = simplexml_load_file('tracking.xml');

  //  print_r($xml);

   foreach( $xml as $FormattedReport->FormattedAreaPair->FormattedAreaPair ) 
        {
        foreach($FormattedReport as $node->FormattedArea->FormattedSections->FormattedSection->FormattedReportObjects)
        echo $node->FormattedReportObject->Value;
        }

} else {
    exit('Failed to open xml');
}
?>

我试图将它剥离回基础,但仍然没有运气。不回应结果。

感谢您的时间!

解决

任何处于类似情况的人都会有一点方向。

  1. 忽略根节点,这是导入XML字符串/文件时的默认$变量
  2. 如果您有嵌套组,请首先为父级创建一个节点,如$ xml-&gt; FormattedAreaPair-&gt; FormattedAreaPair as $ parentnode
  3. 使用您的父节点循环遍历所有孩子
  4. 如果您有一个属性字段,请按以下方式访问它:(string)$ node ['FieldName'])
  5. 将检索到的属性与字符串进行比较,然后处理结果。
  6. 不要拔头发。
  7. (抱歉可怕的代码格式化)

    <?php
    
    if (file_exists('tracking.xml')) { 
        $xml = simplexml_load_file('tracking.xml');
    
        //print_r($xml); 
        foreach( $xml->FormattedAreaPair->FormattedAreaPair as $parentnode ) { 
            foreach($parentnode->FormattedArea->FormattedSections->FormattedSection->FormattedReportObjects->FormattedReportObject as $node){ 
                //echo "FormattedValue: ".$node->FormattedValue."<br />"; 
                switch((string) $node['FieldName']){ 
                    case '{tblCon.ManifestNR}': 
                        echo 'Manifest: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.ShippingDate}': 
                        echo 'Shipping Date: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.CarrierID}': 
                        echo 'Carrier ID: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.CustConRefTX}': 
                        echo 'Customer Reference: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.ServiceCodeTX}': 
                        echo 'Service Code: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.TotalWeightNR}': 
                        echo 'Total Weight: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.ValueNR}': 
                        echo 'Value: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.TotalVolumeNR}': 
                        echo 'Total Volume: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblCon.GoodsDesc}': 
                        echo 'Goods Description: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{tblConAddr.ReceiverNameTX}': 
                        echo 'Receiver Name: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{@SalesOrder}': 
                        echo 'Sales Order: '.$node->FormattedValue."<br />"; 
                        break; 
                    case '{@TrackingReference}': 
                        echo 'Tracking Reference: '.$node->FormattedValue."<br />"; 
                        break;
                }
            }
        echo "---------------------------- <br />";
        }
    } 
    else { 
        exit('Failed to open xml'); 
    }
    
    ?>
    

3 个答案:

答案 0 :(得分:3)

examples in the Manual应该足够了(特别是例子#4)。你似乎是一个足够聪明的家伙。问题是你做错了

<强>使用example.php

<?php
$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
 <movie>
  <title>PHP: Behind the Parser</title>
  <characters>
   <character>
    <name>Ms. Coder</name>
    <actor>Onlivia Actora</actor>
   </character>
   <character>
    <name>Mr. Coder</name>
    <actor>El Act&#211;r</actor>
   </character>
  </characters>
  <plot>
   So, this language. It's like, a programming language. Or is it a
   scripting language? All is revealed in this thrilling horror spoof
   of a documentary.
  </plot>
  <great-lines>
   <line>PHP solves all my web problems</line>
  </great-lines>
  <rating type="thumbs">7</rating>
  <rating type="stars">5</rating>
 </movie>
</movies>
XML;
?>

示例#4

<?php
include 'example.php';

$xml = new SimpleXMLElement($xmlstr);

/* For each <character> node, we echo a separate <name>. */
foreach ($xml->movie->characters->character as $character) {
   echo $character->name, ' played by ', $character->actor, PHP_EOL;
}

?>

请注意,在使用foreach构造时,您需要指定特定类型节点的路径。 foreach中的第二项只是一个(空)变量,用于在迭代中存储当前节点。

答案 1 :(得分:1)

如何使用simplexml(XMLSchema-instance)访问i:nil等属性:

Xml:

<item i:nil="true"/>

Php:

(bool) $item->attributes('i',true)->nil;

答案 2 :(得分:0)

我正在处理的文件大约是1GB,所以我无法一次加载xml文件。 这是我为解析Crystal Reports XML而制作的CI控制器。

<?php

class Parse_crystal_reports_xml extends CI_Controller {

    function index(){
        $base_path = "/path/to/xml/";
        $xml_file = "xml_file.xml";
        $file_header = '<?xml version="1.0" encoding="UTF-8" ?>';
        $separator = '<FormattedAreaPair Level="1" Type="Details">';
        $xml_data = explode($separator, str_replace($file_header, '', file_get_contents($base_path.$xml_file)));
        $bad_names = array('xsi:','xsd:');
        foreach($xml_data as $block_num => $block) : 
            if(!$block_num) : continue; endif;
            $fields = new SimpleXMLElement(str_replace($bad_names, '', $file_header."\n".$separator.$block));
            $temp_array = array();
            foreach($fields->FormattedArea->FormattedSections->FormattedSection->FormattedReportObjects->FormattedReportObject as $field_num => $field) :
                // print_r($field);
                $temp_array[$this->make_slug($field['FieldName'])] = $this->clean_word((string)$field->FormattedValue);
            endforeach;
            // print_r($fields);
            print_r($temp_array);
            die;
        endforeach;
    }

    function make_slug($string){
        return strtolower(trim(preg_replace('/\W+/', '_', $string), '_'));
    }

    function clean_word($string){
        return trim(preg_replace('/\s+/', ' ', $string));
    }
}
?>