所以我在PHP中使用simplexml和XML解析,但我发现我需要解析的一些文件包含错误。当然,我可以手动编辑每个有问题的文件,但是大约有10000多个文件,这将永远带我。
好的,关于错误。当您尝试在浏览器中打开XML文件时,会显示以下消息:
警告:simplexml_load_string():实体:第2行:解析器错误:属性检查的规范授权值...
我找到了ff。标签触发错误(此处:传输而不是检查):
<Public Transport Rules>
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile>
<location>Citybus</location>
<format>Events</format>
</localfile>
</Files>
</PublicTransport>
</Public Transport Rules>
显然,标签内的空格导致了问题。这些标签在文件中出现不止一次。
我认为simplexml
通过它在浏览器中看到的内容(面值)进行解析,因此如果您的XML文件存在问题,它将无法正常解析。我想通过读取源文件来解析PHP,也许从那里编辑文件。但似乎任何fopens
都会打开您在浏览器页面中阅读的内容。
暂时坚持这个问题一段时间了。任何意见,将不胜感激。
谢谢!
答案 0 :(得分:2)
如果您可以重命名具有空格的标记,那么整齐是一个很好的选择,因为它也适用于XML:
$xml = simplexml_load_string(
tidy_repair_string($string, ['input-xml' => 1])
);
echo "SimpleXML::asXML():\n", $xml->asXML(), "\n\n";
重命名标记并创建属性:
SimpleXML::asXML():
<?xml version="1.0"?>
<Public Transport="" Rules="">
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile> <location>Citybus</location>
<format>Events</format> </localfile></Files>
</PublicTransport>
</Public>
还有更多缩进选项等,这里有一个完整的例子:
<?php
/**
* How to parse XML files with errors using Simplexml in PHP?
*
* @link http://stackoverflow.com/q/15620492/367456
*/
$string = '<?xml version="1.0" ?>
<Public Transport Rules>
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile>
<location>Citybus</location>
<format>Events</format>
</localfile>
</Files>
</PublicTransport>
</Public Transport Rules>';
echo "Broken:\n", $string, "\n\n";
$fixed = tidy_repair_string($string, ['input-xml' => 1, 'output-xml' => 1, 'indent' => 1]);
echo "Fixed:\n", $fixed, "\n\n";
$xml = simplexml_load_string(tidy_repair_string($string, ['input-xml' => 1]));
echo "SimpleXML::asXML():\n", $xml->asXML(), "\n\n";
输出:
Broken:
<?xml version="1.0" ?>
<Public Transport Rules>
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile>
<location>Citybus</location>
<format>Events</format>
</localfile>
</Files>
</PublicTransport>
</Public Transport Rules>
Fixed:
<?xml version="1.0"?>
<Public Transport="" Rules="">
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile>
<location>Citybus</location>
<format>Events</format> </localfile></Files>
</PublicTransport>
</Public>
SimpleXML::asXML():
<?xml version="1.0"?>
<Public Transport="" Rules="">
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile> <location>Citybus</location>
<format>Events</format> </localfile></Files>
</PublicTransport>
</Public>
答案 1 :(得分:1)
DOM函数旨在处理无效标记,因此您可以尝试一下:
<?php
$string = '<?xml version="1.0" ?>
<Public Transport Rules>
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile>
<location>Citybus</location>
<format>Events</format>
</localfile>
</Files>
</PublicTransport>
</Public>';
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTML($string);
libxml_use_internal_errors(FALSE);
$dom->formatOutput = TRUE;
echo '::: Original XML :::' . PHP_EOL;
echo $string . PHP_EOL;
echo PHP_EOL;
echo '::: Fixed XML :::' . PHP_EOL;
if( version_compare(PHP_VERSION, '5.3.6', '>=') ){
$body = $dom->documentElement->firstChild;
if( $body->hasChildNodes() ){
foreach($body->childNodes as $node){
echo $dom->saveHTML($node);
}
}
}else{
$body = $dom->getElementsByTagName('body')->item(0);
if( $body->hasChildNodes() ){
foreach($body->childNodes as $node){
echo $dom->saveHTML($node);
}
}
}
echo PHP_EOL;
...打印出来:
::: Original XML :::
<?xml version="1.0" ?>
<Public Transport Rules>
<PublicTransport id="0">
<Issued>null</Issued>
<Files><localfile>
<location>Citybus</location>
<format>Events</format>
</localfile>
</Files>
</PublicTransport>
</Public>
::: Fixed XML :::
<public transport rules><publictransport id="0"><issued>null</issued><files><localfile>
<location>Citybus</location>
<format>Events</format>
</localfile>
</files></publictransport></public>
没有办法知道在这个过程中会丢失什么,但我们首先处理的是无效数据。
无论如何,您始终可以使用PHP 自动编辑每个有问题的文件。您的文件可能不是XML,但它们确实是字符串; - )