我对R非常陌生,我尝试合并并解析几个XML元素。我导入了一个CSV,其中的一列包含178个XML地址。
我想“获取”这些XML地址,将它们转换为一个大的XML文件,然后在数据帧中进行解析。最终,我想将此数据框导出为CSV。
我已经安装了XML和XML2软件包。然后,我遵循了一个教程,并尝试使用xmlTreeParse函数在单个XML地址(http://ec.europa.eu/europeaid/files/iati/XI-IATI-EC_DEVCO_C_AG.xml)上进行操作。
我还导入了带有178个地址的CSV。
但是我不知道如何从这里得到的数据中获取数据帧。
# Install and load the necessary packages
library(XML)
library(xml2)
# Save the URL of the xml file in a variable
xml.url <- "http://ec.europa.eu/europeaid/files/iati/XI-IATI-EC_DEVCO_C_AG.xml"
# Use the xmlTreePares-function to parse xml file directly from the web
xmlfile <- xmlTreeParse(xml.url)
# The xml file is now saved as an object you can easily work with in R
class(xmlfile)
# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# Have a look at the XML-code of the first subnodes
print(xmltop)[1:2]
# To extract the XML-values from the document, use xmlSApply
devcoafgh <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
# Finally, get the data in a data-frame and have a look at the first rows and columns (PROBLEM)
devcoafgh_df <- data.frame(t(devcoafgh),row.names=NULL)
devcoafgh_df[1:5,1:4]
# Just 3 tests
print(devcoafgh)
print(xmlfile)
write.csv(devcoafgh_df, file = "afghdata.csv")
# Tests done
# Import data containing all XML addresses
xmladdresses <- read.csv("xml_addresses.csv")
# Create a variable with just the right column
xmlurls <- xmladdresses[c(5)]
# Save all URL's contained in this variable in new variables (178 in total)
xml.list <- (xmlurls)
最后,我希望有一个大数据框架来编译我可以解析和导出的178个XML文件。
答案 0 :(得分:0)
我不确定这是否是您想要的,但对于您的一个XML文件示例,这将创建一个包含所有信息的tibble
(如果丢失任何信息,只需将{{ 1}})
NA