我从XML中提取数据,而某些标签以这种方式在CDATA中包含数据
<description><![CDATA[Changes (as compared to 8.17) include:
Features:
* Added a ‘Schema Optimizer’ feature. Based on “procedure analyse()” it will propose alterations to data types for a table based on analysis on what data are stored in the table. The feature is available from INFO tab/HTML mode. Refer to documentation for details.
* A table can now be added [...]]]>
</description>
我已经在使用preq_match从描述标记中提取数据。那么如何从CDATA中提取数据?
答案 0 :(得分:7)
无论使用哪种语言,都不要使用正则表达式来解析XML - 你几乎肯定会弄错它。使用XML parser。
答案 1 :(得分:0)
如果您需要提取一组复杂的数据,则应使用simple_xml和xpath
。
<?php
$string = <<<XML
<?xml version='1.0'?>
<document>
<title>Forty What?</title>
<from>Joe</from>
<to>Jane</to>
<body>
I know that's the answer -- but what's the question?
</body>
</document>
XML;
$xml = simplexml_load_string($string);
var_dump($xml);
?>
会提供如下输出:
SimpleXMLElement Object
(
[title] => Forty What?
[from] => Joe
[to] => Jane
[body] =>
I know that's the answer -- but what's the question?
)
所以在你的情况下,你只需要在文档中导航,然后在表达式中更容易,不是吗?
答案 2 :(得分:0)
@Pavel Minaev是正确的保留正则表达式的选项作为最后的手段,并且对于xml总是使用Xml解析器,您现在可以在几乎所有语言中找到xml解析器。例如我通常使用DOMDocument在php中解析或创建xml。它非常简单易懂,特别适合像我这样偶尔使用php的人。
例如,您希望从以下xml中提取CDATA
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE message SYSTEM "https://www.abcd.com/dtds/AbcdefMessageXmlApi.dtd">
<message id="9002">
<report>
<![CDATA[id:50121515075540159 sub:001 text text text text text]]>
</report>
<number>353874181931</number>
</message>
使用以下代码提取CDATA
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
if (TRUE != $doc->loadXML($xml_response)) {
// log error and / or throw expection or whatever
}
$response_element = $doc->documentElement;
if($response_element->tagName == "message"){
$report_node = $response_element->getElementsByTagName("report");
if($report_node != null && $report_node->length == 1) {
$narrative = $report_node->item(0)->textContent;
$log->debug("CDATA: $narrative");
} else {
$log->error("unable to find report tag or multiple report tag found in response xml");
}
} else {
$log->error("unexpected root tag (" . $response_element->tagName .") in response xml");
}
执行此$narrative
变量后应该包含所有文本,不要担心它不会包含丑陋的标记部分CDATA。
快乐编码:)