答案 0 :(得分:4)
没有办法绕过它 - 你需要理解XML和XPath才能在R中使用它。假设你这样做,在浏览器中查看文档以了解它的结构。然后,这应该让您开始使用XML包。
library(XML)
xml <- xmlParse("http://data.mcc.gov/raw/xml/MCC_HN.xml")
org <- xpathApply(xml,"//iati-activity/reporting-org",xmlValue)
id <- xpathApply(xml,"//iati-activity/iati-identifier",xmlValue)
title <- xpathApply(xml,"//iati-activity/title",xmlValue)
desc.1 <- xpathApply(xml,"//iati-activity/description[@type='1']",xmlValue)
desc.2 <- xpathApply(xml,"//iati-activity/description[@type='2']",xmlValue)
status <- xpathApply(xml,"//iati-activity/activity-status",xmlValue)
start.planned <- xpathApply(xml,"//iati-activity/activity-date[@type='start-planned']",xmlValue)
start.actual <- xpathApply(xml,"//iati-activity/activity-date[@type='start-actual']",xmlValue)
end.planned <- xpathApply(xml,"//iati-activity/activity-date[@type='end-planned']",xmlValue)
end.actual <- xpathApply(xml,"//iati-activity/activity-date[@type='end-actual']",xmlValue)
df <- data.frame(cbind(org,id, title, status,
start.planned, start.actual, end.planned, end.actual,
desc.1, desc.2))
阅读有关我上面使用的功能的文档,例如: xmlParse(...)
,xpathApply(...)
和xmlValue(...)
以确定代码的作用。
注意:XML包中有一个函数xmlToDataFrame(...)
。您的文档存在的问题是,您有多个具有相同标记名称的元素(示例:description
和activity-date
),这些元素使用type=
属性消除歧义。 xmlToDataFrame(...)
不知道如何处理,所以你需要这么做......
答案 1 :(得分:0)
你想要对数据做什么并不是很清楚,但在这里我们得到它
xml = xmlParse("http://data.mcc.gov/raw/xml/MCC_HN.xml")
然后查询所有“交易”记录的结果并将它们变成数据框
df <- xmlToDataFrame(xml["//transaction"])
与
> dim(df)
[1] 730 11
> head(df, 2)
aid-type
1
2
description
1 Commitment: Honduras-614G Fund-Not Applicable-Not Applicable-2011-04-01
2 Disbursement: Honduras-614G Fund-Not Applicable-Not Applicable-2011-04-01
disbursement-channel finance-code flow-type provider-org
1 Millennium Challenge Corporation
2 Millennium Challenge Corporation
receiver-org tied-status transaction-date transaction-type value
1 Honduras 2011-04-01 COMMITMENT 274380.75
2 Honduras 2011-04-01 DISBURSEMENT 0.00
也许您想要提取与'aid-type'相关联的属性并将其添加到数据框中;使用XPath来做到这一点
df$`aid-type-code` <- as.character(xml["//aid-type/@code"])