通过Xquery从XML获取数据到文本

时间:2011-08-17 20:05:06

标签: xml xpath xquery

我有一大堆大型XML文件,我想从中提取一些数据。我正在使用Altova XMLSpy的评估版,其中我设法让XPATH工作。但是,我需要CSV或文本格式的数据,因此我可以在R或Excel中使用它进行进一步评估,我无法将XPATH的结果复制到文件中。我发现使用XQUERY我可以,但我无法让XQUERY至少为一个文件工作。

XML的结构如下:

<d2LogicalModel xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datex2.eu/schema/2_0/2_0" modelBaseVersion="2.0" xsi:schemaLocation="http://datex2.eu/schema/2_0/2_0 D:\NDW\CSS\DataGenerator\DATEXIISchema_2_0_2_0.xsd">
<payloadPublication xmlns="http://datex2.eu/schema/2_0/2_0" xsi:type="MeasuredDataPublication" lang="nl">
    <publicationTime>2011-04-21T05:58:34Z</publicationTime>
    <publicationCreator>
        <country>nl</country>
        <nationalIdentifier>NDW-CNS</nationalIdentifier>
    </publicationCreator>
    <measurementSiteTableReference>NDW01_MT_321</measurementSiteTableReference>
    <headerInformation>
        <areaOfInterest>national</areaOfInterest>
        <confidentiality>restrictedToAuthorities</confidentiality>
        <informationStatus>real</informationStatus>
    </headerInformation>
    <siteMeasurements>
        <measurementSiteReference>GRT01_MORO_1002_2</measurementSiteReference>
        <measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault>
        <measuredValue index="1">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="2">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="3">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="4">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="5">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
        <measuredValue index="6">
            <basicDataValue xsi:type="TrafficSpeed"/>
        </measuredValue>
    </siteMeasurements>
    <siteMeasurements>
        <measurementSiteReference>RWS01_MONIBAS_0021hrr2131ra</measurementSiteReference>
        <measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault>
        <measuredValue index="1">
            <basicDataValue xsi:type="TrafficFlow">
                <time>2011-04-21T05:56:00Z</time>
                <vehicleFlow>900</vehicleFlow>
            </basicDataValue>
        </measuredValue>
        <measuredValue index="2">
            <basicDataValue xsi:type="TrafficSpeed">
                <numberOfInputValuesUsed>60</numberOfInputValuesUsed>
                <standardDeviation>0</standardDeviation>
                <time>2011-04-21T05:56:00Z</time>
                <averageVehicleSpeed>115</averageVehicleSpeed>
            </basicDataValue>
        </measuredValue>
        <measuredValue index="3">
            <basicDataValue xsi:type="TrafficFlow">
                <time>2011-04-21T05:56:00Z</time>
                <vehicleFlow>1020</vehicleFlow>
            </basicDataValue>
        </measuredValue>
        <measuredValue index="4">
            <basicDataValue xsi:type="TrafficSpeed">
                <numberOfInputValuesUsed>60</numberOfInputValuesUsed>
                <standardDeviation>0</standardDeviation>
                <time>2011-04-21T05:56:00Z</time>
                <averageVehicleSpeed>104</averageVehicleSpeed>
            </basicDataValue>
        </measuredValue>
    </siteMeasurements>

我想过滤measurementSiteReference的特定值,并获取measuredValue basicDataValue的所有TrafficFlow的结果,最好采用以下格式:

index, value, timestamp
1, 900, 05:56:00
3, 1020, 05:56:00

我有以下XPATH:

//text()[contains(.,"GEO01_Z_RWSTI1011")]/parent::*/parent::*/descendant::measuredValue[(@index)]/basicDataValue/vehicleFlow

这为我提供了一个文件的结果,但我找不到将XPATH转换为XQUERY的方法。当前的XQUERY没有返回任何结果:

let $nl := "&#10;"
for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/d2LogicalModel/payloadPublication/siteMeasurements
where $x/measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return concat($x/measurementSiteReference/measuredValue,$nl)

我如何使用XQUERY获得我想要的回报?

3 个答案:

答案 0 :(得分:0)

尝试 - http://www.stylusstudio.com/xquery_primer.html  要么 http://xmlbeans.apache.org/docs/2.0.0/guide/conSelectingXMLwithXQueryPathXPath.html

此外,还有来自Oracle,IBM和Microsoft的优秀示例 - 如果您确实需要高级帮助。

答案 1 :(得分:0)

您的元素绑定到命名空间xmlns="http://datex2.eu/schema/2_0/2_0",但您不是在XPATH语句中限定元素的命名空间。因此,您的XPATH语句不会选择您想要的元素。

您可能希望执行类似这样的操作来声明命名空间并在XPath语句中使用它:

declare namespace datex = "http://datex2.eu/schema/2_0/2_0";

let $nl := "&#10;"

for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return concat($x/datex:measurementSiteReference/datex:measuredValue,$nl)

但是,您可能会在序列上使用concat()遇到问题,并且当前代码不会产生您想要的输出。

答案 2 :(得分:0)

我设法得到答案,虽然不完整,但我想:

declare namespace datex = "http://datex2.eu/schema/2_0/2_0";
declare variable  $sep := ',';
declare variable  $eol := '&#10;';

for $x in collection("0900_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements
let $site := $x/datex:measurementSiteReference/text()
let $time := $x/datex:measurementTimeDefault/text()
let $index := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/@index
let $flow := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/datex:basicDataValue/datex:vehicleFlow/text()
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")]
return string(concat(string-join(($site,$time,$flow),$sep),$eol))