R:从XML数据中提取特定节点内容

时间:2014-07-16 16:07:08

标签: xml r xpath xml-parsing treenode

使用R和XML包(xmlTreeParse等)我尽力从xml文件中读取特定节点但没有成功。以下xml虚拟示例表示我正在使用的数据:

<item> 
<title> Mickey Mouse </title>
<description> Cartoon </description>
<pubDate> 25 Apr 1965 </pubDate>
 <disney:Filing web="http://www.waltdisney.com/archives">
 <disney:fileNumber>125364</disney:fileNumber>
 <disney:assignedID>7389</disney:assignedID>
 <disney:Files>
  <disney:File disney:set="1" disney:file="abc.mov" disney:type="B&W"/>
  <disney:File disney:set="2" disney:file="def.mov" disney:type="Col"/>
  <disney:File disney:set="3" disney:file="wzt.mov" disney:type="B&W"/>
 </disney:Files>
</disney:Filing>
</item> 

我应用xpathApply成功提取前三个节点。但我无法进入标记为&#34; disney:File&#34;的节点。出于某种原因,除了迪士尼之外的任何事情:文件是不可读的(&#34;隐形&#34;)。

我的目标是将所有迪士尼:文件行提取到数据框中或更加漂亮:首先搜索特定迪斯尼:设置并将此节点中的所有信息单独提取到数据框中。任何帮助都会非常棒。提前谢谢!

1 个答案:

答案 0 :(得分:2)

一些示例数据

'<?xml version="1.0"?>
<aw:PurchaseOrder
    aw:PurchaseOrderNumber="99503"
aw:OrderDate="1999-10-20"
xmlns:aw="http://www.adventure-works.com">
<aw:Address aw:Type="Shipping">
<aw:Name>Ellen Adams</aw:Name>
<aw:Street>123 Maple Street</aw:Street>
<aw:City>Mill Valley</aw:City>
<aw:State>CA</aw:State>
<aw:Zip>10999</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:Address aw:Type="Billing">
<aw:Name>Tai Yee</aw:Name>
<aw:Street>8 Oak Avenue</aw:Street>
<aw:City>Old Town</aw:City>
<aw:State>PA</aw:State>
<aw:Zip>95819</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:DeliveryNotes>Please leave packages in shed by driveway.</aw:DeliveryNotes>
<aw:Items>
<aw:Item aw:PartNumber="872-AA">
<aw:ProductName>Lawnmower</aw:ProductName>
<aw:Quantity>1</aw:Quantity>
<aw:USPrice>148.95</aw:USPrice>
<aw:Comment>Confirm this is electric</aw:Comment>
</aw:Item>
<aw:Item aw:PartNumber="926-AA">
<aw:ProductName>Baby Monitor</aw:ProductName>
<aw:Quantity>2</aw:Quantity>
<aw:USPrice>39.98</aw:USPrice>
<aw:ShipDate>1999-05-21</aw:ShipDate>
</aw:Item>
</aw:Items>
</aw:PurchaseOrder>' -> xData

您可以声明命名空间并在此处为其指定标记,我们使用ns。在这种情况下,我们可以使用aw:Item,但我们将命名空间标记为示例:

library(XML)
myData <- xmlParse(xData)
> xpathSApply(myData, "//*/ns:Item/ns:ProductName"
              , namespaces = c(ns = "http://www.adventure-works.com")
              , xmlValue)
[1] "Lawnmower"    "Baby Monitor"