我有一大堆大型XML文件,我想从中提取一些数据。我正在使用Altova XMLSpy
的评估版,其中我设法让XPATH
工作。但是,我需要CSV
或文本格式的数据,因此我可以在R或Excel中使用它进行进一步评估,我无法将XPATH的结果复制到文件中。我发现使用XQUERY
我可以,但我无法让XQUERY
至少为一个文件工作。
XML的结构如下:
<d2LogicalModel xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datex2.eu/schema/2_0/2_0" modelBaseVersion="2.0" xsi:schemaLocation="http://datex2.eu/schema/2_0/2_0 D:\NDW\CSS\DataGenerator\DATEXIISchema_2_0_2_0.xsd">
<payloadPublication xmlns="http://datex2.eu/schema/2_0/2_0" xsi:type="MeasuredDataPublication" lang="nl">
<publicationTime>2011-04-21T05:58:34Z</publicationTime>
<publicationCreator>
<country>nl</country>
<nationalIdentifier>NDW-CNS</nationalIdentifier>
</publicationCreator>
<measurementSiteTableReference>NDW01_MT_321</measurementSiteTableReference>
<headerInformation>
<areaOfInterest>national</areaOfInterest>
<confidentiality>restrictedToAuthorities</confidentiality>
<informationStatus>real</informationStatus>
</headerInformation>
<siteMeasurements>
<measurementSiteReference>GRT01_MORO_1002_2</measurementSiteReference>
<measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault>
<measuredValue index="1">
<basicDataValue xsi:type="TrafficSpeed"/>
</measuredValue>
<measuredValue index="2">
<basicDataValue xsi:type="TrafficSpeed"/>
</measuredValue>
<measuredValue index="3">
<basicDataValue xsi:type="TrafficSpeed"/>
</measuredValue>
<measuredValue index="4">
<basicDataValue xsi:type="TrafficSpeed"/>
</measuredValue>
<measuredValue index="5">
<basicDataValue xsi:type="TrafficSpeed"/>
</measuredValue>
<measuredValue index="6">
<basicDataValue xsi:type="TrafficSpeed"/>
</measuredValue>
</siteMeasurements>
<siteMeasurements>
<measurementSiteReference>RWS01_MONIBAS_0021hrr2131ra</measurementSiteReference>
<measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault>
<measuredValue index="1">
<basicDataValue xsi:type="TrafficFlow">
<time>2011-04-21T05:56:00Z</time>
<vehicleFlow>900</vehicleFlow>
</basicDataValue>
</measuredValue>
<measuredValue index="2">
<basicDataValue xsi:type="TrafficSpeed">
<numberOfInputValuesUsed>60</numberOfInputValuesUsed>
<standardDeviation>0</standardDeviation>
<time>2011-04-21T05:56:00Z</time>
<averageVehicleSpeed>115</averageVehicleSpeed>
</basicDataValue>
</measuredValue>
<measuredValue index="3">
<basicDataValue xsi:type="TrafficFlow">
<time>2011-04-21T05:56:00Z</time>
<vehicleFlow>1020</vehicleFlow>
</basicDataValue>
</measuredValue>
<measuredValue index="4">
<basicDataValue xsi:type="TrafficSpeed">
<numberOfInputValuesUsed>60</numberOfInputValuesUsed>
<standardDeviation>0</standardDeviation>
<time>2011-04-21T05:56:00Z</time>
<averageVehicleSpeed>104</averageVehicleSpeed>
</basicDataValue>
</measuredValue>
</siteMeasurements>
我想过滤measurementSiteReference的特定值,并获取measuredValue
basicDataValue
的所有TrafficFlow
的结果,最好采用以下格式:
index, value, timestamp
1, 900, 05:56:00
3, 1020, 05:56:00
我有以下XPATH:
//text()[contains(.,"GEO01_Z_RWSTI1011")]/parent::*/parent::*/descendant::measuredValue[(@index)]/basicDataValue/vehicleFlow
这为我提供了一个文件的结果,但我找不到将XPATH
转换为XQUERY
的方法。当前的XQUERY没有返回任何结果:
let $nl := " "
for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/d2LogicalModel/payloadPublication/siteMeasurements
where $x/measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return concat($x/measurementSiteReference/measuredValue,$nl)
我如何使用XQUERY获得我想要的回报?
答案 0 :(得分:0)
尝试 - http://www.stylusstudio.com/xquery_primer.html 要么 http://xmlbeans.apache.org/docs/2.0.0/guide/conSelectingXMLwithXQueryPathXPath.html
此外,还有来自Oracle,IBM和Microsoft的优秀示例 - 如果您确实需要高级帮助。
答案 1 :(得分:0)
您的元素绑定到命名空间xmlns="http://datex2.eu/schema/2_0/2_0"
,但您不是在XPATH语句中限定元素的命名空间。因此,您的XPATH语句不会选择您想要的元素。
您可能希望执行类似这样的操作来声明命名空间并在XPath语句中使用它:
declare namespace datex = "http://datex2.eu/schema/2_0/2_0";
let $nl := " "
for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return concat($x/datex:measurementSiteReference/datex:measuredValue,$nl)
但是,您可能会在序列上使用concat()
遇到问题,并且当前代码不会产生您想要的输出。
答案 2 :(得分:0)
我设法得到答案,虽然不完整,但我想:
declare namespace datex = "http://datex2.eu/schema/2_0/2_0";
declare variable $sep := ',';
declare variable $eol := ' ';
for $x in collection("0900_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements
let $site := $x/datex:measurementSiteReference/text()
let $time := $x/datex:measurementTimeDefault/text()
let $index := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/@index
let $flow := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/datex:basicDataValue/datex:vehicleFlow/text()
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return string(concat(string-join(($site,$time,$flow),$sep),$eol))