使用R和XML包(xmlTreeParse等)我尽力从xml文件中读取特定节点但没有成功。以下xml虚拟示例表示我正在使用的数据:
<item>
<title> Mickey Mouse </title>
<description> Cartoon </description>
<pubDate> 25 Apr 1965 </pubDate>
<disney:Filing web="http://www.waltdisney.com/archives">
<disney:fileNumber>125364</disney:fileNumber>
<disney:assignedID>7389</disney:assignedID>
<disney:Files>
<disney:File disney:set="1" disney:file="abc.mov" disney:type="B&W"/>
<disney:File disney:set="2" disney:file="def.mov" disney:type="Col"/>
<disney:File disney:set="3" disney:file="wzt.mov" disney:type="B&W"/>
</disney:Files>
</disney:Filing>
</item>
我应用xpathApply成功提取前三个节点。但我无法进入标记为&#34; disney:File&#34;的节点。出于某种原因,除了迪士尼之外的任何事情:文件是不可读的(&#34;隐形&#34;)。
我的目标是将所有迪士尼:文件行提取到数据框中或更加漂亮:首先搜索特定迪斯尼:设置并将此节点中的所有信息单独提取到数据框中。任何帮助都会非常棒。提前谢谢!
答案 0 :(得分:2)
一些示例数据
'<?xml version="1.0"?>
<aw:PurchaseOrder
aw:PurchaseOrderNumber="99503"
aw:OrderDate="1999-10-20"
xmlns:aw="http://www.adventure-works.com">
<aw:Address aw:Type="Shipping">
<aw:Name>Ellen Adams</aw:Name>
<aw:Street>123 Maple Street</aw:Street>
<aw:City>Mill Valley</aw:City>
<aw:State>CA</aw:State>
<aw:Zip>10999</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:Address aw:Type="Billing">
<aw:Name>Tai Yee</aw:Name>
<aw:Street>8 Oak Avenue</aw:Street>
<aw:City>Old Town</aw:City>
<aw:State>PA</aw:State>
<aw:Zip>95819</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:DeliveryNotes>Please leave packages in shed by driveway.</aw:DeliveryNotes>
<aw:Items>
<aw:Item aw:PartNumber="872-AA">
<aw:ProductName>Lawnmower</aw:ProductName>
<aw:Quantity>1</aw:Quantity>
<aw:USPrice>148.95</aw:USPrice>
<aw:Comment>Confirm this is electric</aw:Comment>
</aw:Item>
<aw:Item aw:PartNumber="926-AA">
<aw:ProductName>Baby Monitor</aw:ProductName>
<aw:Quantity>2</aw:Quantity>
<aw:USPrice>39.98</aw:USPrice>
<aw:ShipDate>1999-05-21</aw:ShipDate>
</aw:Item>
</aw:Items>
</aw:PurchaseOrder>' -> xData
您可以声明命名空间并在此处为其指定标记,我们使用ns
。在这种情况下,我们可以使用aw:Item
,但我们将命名空间标记为示例:
library(XML)
myData <- xmlParse(xData)
> xpathSApply(myData, "//*/ns:Item/ns:ProductName"
, namespaces = c(ns = "http://www.adventure-works.com")
, xmlValue)
[1] "Lawnmower" "Baby Monitor"