我正在尝试流解析一个大约4GB的XML文件,并将其中的一部分写入PHP中的新XML文件。
~4GB XML文档的结构是这样的,我试图保留<doc>
元素及其<title></title>
<url></url>
和<abstract></abstract>
子元素。
但当我运行此脚本时,我得到的是每行<doc />
一个文件。所以基本上它是复制<doc>
元素并让它们自我 - 关闭,但不要复制其子女。
<?php
$interestingNodes = array('title','url','abstract');
$xmlObject = new XMLReader();
$xmlObject->open('file.xml');
$xmlOutput = new XMLWriter();
$xmlOutput->openURI('destfile.xml');
$xmlOutput->setIndent(true);
$xmlOutput->setIndentString(" ");
$xmlOutput->startDocument('1.0', 'UTF-8');
while($xmlObject->read()){
if($xmlObject->name == 'doc'){
$xmlOutput->startElement('doc');
$xmlObject->readInnerXML();
if(array_search($xmlObject->name, $interestingNodes)){
$xmlOutput->startElement($xmlObject->name);
$xmlOutput->text($xmlObject->value);
$xmlOutput->endElement(); //close the current node
}
$xmlOutput->endElement(); //close the doc node
}
}
$xmlObject->close();
$xmlOutput->endDocument();
$xmlOutput->flush();
?>
这是file.xml的样子:
<feed>
<doc>
<title>Title of first doc is here</title>
<url>URL is here</url>
<abstract>Abstract is here...</abstract>
<links>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
</link>
</doc>
<doc>
<title>Title of second doc is here</title>
<url>URL is here</url>
<abstract>Abstract is here...</abstract>
<links>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
<sublink>Link is here</sublink>
</link>
</doc>
</feed>
这就是我想要的destfile.xml:
<doc>
<title>Title of first doc is here</title>
<url>URL is here</url>
<abstract>Abstract is here...</abstract>
</doc>
<doc>
<title>Title of second doc is here</title>
<url>URL is here</url>
<abstract>Abstract is here...</abstract>
</doc>
但是,当我在上面运行该脚本时,我得到的只是:
<doc />
<doc />
<doc />
<doc />
<doc />
<doc />
/* And many, many more <doc />s */
答案 0 :(得分:0)
我相信以下内容会做你想要做的事情:
<?php
$interestingNodes = array('title','url','abstract');
$xmlObject = new XMLReader();
$xmlObject->open('file.xml');
$xmlOutput = new XMLWriter();
$xmlOutput->openURI('destfile.xml');
$xmlOutput->setIndent(true);
$xmlOutput->setIndentString(" ");
$xmlOutput->startDocument('1.0', 'UTF-8');
while($xmlObject->read()){
if($xmlObject->name == 'doc'){
if($xmlObject->nodeType==XMLReader::END_ELEMENT) $xmlOutput->endElement();
else $xmlOutput->startElement('doc');
}
if(in_array($xmlObject->name, $interestingNodes)){
$xmlOutput->startElement($xmlObject->name);
$xmlOutput->text($xmlObject->readString());
$xmlOutput->endElement(); //close the current node
}
}
$xmlObject->close();
$xmlOutput->endDocument();
$xmlOutput->flush();
?>