需要解析非常大的XML文件PHP

时间:2015-07-09 10:38:33

标签: php

您好,我正在尝试使用XMLParser解析xml文件

我需要结束这样的事情 productID:xerox_106r0116 名称:Xerox zwart,tonercartridge遇到grote capacityiteit(总计32.000 pag。)(106R01163)

然而,我得到的是这个

text:xerox_106r0116

文字:Xerox zwart,tonercartridge遇到了grote capacityiteit(总计32.000 pag。)(106R01163)

有没有人知道如何使用XMLParser正确解析

XML在

之下
<?xml version="1.0" encoding="utf-8"?>
<products>
<product>
<productID>xerox_106r01163</productID>
<name>Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)</name>
<price currency="EUR">165.77</price>
<productURL>http://www.centralpoint.nl/tracker/index.php?tt=534_251713_1_&amp;r=http%3A%2F%2Fwww.centralpoint.nl%2Ftoners-laser-cartridges%2Fxerox%2Fzwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-art-106r01163-num-17879%2F</productURL>
<imageURL>https://www02.cp-static.com/objects/low_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</imageURL>
<description><![CDATA[Black Toner Cartridge, Phaser 7760
Our Phaser 7760
 toner cartridges utilize a revolutionary toner manufacturing process where toner is chemically grown and processed into very small and consistent particles, resulting in sharper, high-gloss image quality, an increased range of colors, enhanced fine-line detail and superior reliability. Our longer-life toner cartridges reduce the need for customer interaction, and the Black toner cartridges print up to 32,000 pages each at 5% average area coverage.]]></description>
<categories>
<category path="toners &amp; lasercartridges">toners &amp; lasercartridges</category>
</categories>
<additional>
<field name="brand">Xerox</field>
<field name="producttype">zwart, tonercartridge met grote capaciteit (tot 32.000 pag.)</field>
<field name="deliveryCosts">0.00</field>
<field name="SKU">106R01163</field>
<field name="brand_and_type">Xerox 106R01163</field>
<field name="stock">Op voorraad</field>
<field name="thumbnailURL">https://www02.cp-static.com/objects/thumb_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</field>
<field name="deliveryTime">1 werkdag</field>
<field name="imageURLlarge">https://www02.cp-static.com/objects/high_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</field>
<field name="categoryURL">http://www.centralpoint.nl/toners-laser-cartridges/</field>
<field name="EAN">0095205224016</field>
</additional>
</product>
</products>

2 个答案:

答案 0 :(得分:2)

对于非常大的文件,我使用XMLReader(在文档流中作为光标前进)和SimpleXMLElement的组合。在你的情况下,它将是这样的:

$xml = new XMLReader();
if(!$xml->open($FILE_NAME)){
    die("Error opening the XML file");
}

//Process XML with the product list
while($xml->read()){
    if($xml->nodeType==XMLReader::ELEMENT && $xml->name == 'product'){
        $product_xml = $xml->readOuterXml();

        //Getting attributes
        $product = simplexml_load_string($product_xml, 'SimpleXMLElement', LIBXML_NOBLANKS && LIBXML_NOWARNING);
        $product_id = (string)$product->productID;
        $product_name = (string)$product->name;

        //Then do something with product_id and product_name...
        echo "ProductID: ".$product_id." name:".$product_name;
    }
}
$xml->close();

希望这有帮助。

答案 1 :(得分:0)

这是我用于xml解析的内容,它将xml文档加载到DOMDocument对象中,您可以从中进行操作。看看是否有帮助

    $xml = new DOMDocument();
    $xml->preserveWhiteSpace = false;
    try{
        $xml->loadXML($string, LIBXML_NSCLEAN);
    }catch(\Exception $e){
       throw new Exception('Invalid XML structure');
    }
    return $xml;