我正在使用XML Package撰写的Duncan Temple Lang。使用xmlToDataFrame()
函数将其转换为数据框时,出现此错误:
Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("\n Albania\n ", :
duplicate subscripts for columns
如何处理?
完整代码:
library(XML)
xml.url <- 'http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml'
xml.file <- xmlParse(xml.url)
xml.df <- xmlToDataFrame(xml.file)
# or in this way, doesn't make difference too
xml.df <- xmlToDataFrame(xml.url)
答案 0 :(得分:1)
至少对于某些节点,您可以使用xmlAttrsToDataFrame
。城市包括属性和值,而城市名称等某些标签可能会重复,因此您需要编写自己的函数
XML:::xmlAttrsToDataFrame(xml.file["//country"])
id name capital population datacode total_area population_growth infant_mortality gdp_agri gdp_total inflation indep_date
1 f0_136 Albania f0_1461 3249136 AL 28750 1.34 49.2 55 4100 16 28 11 1912
2 f0_144 Andorra f0_1464 72766 AN 450 2.96 2.2 <NA> 1000 <NA> <NA>
3 f0_149 Austria f0_1467 8023244 AU 83850 0.41 6.2 2 152000 2.3 12 11 1918
4 f0_157 Belarus f0_1474 10415973 BO 207600 0.2 13.4 21 49200 244 25 08 1991
...
XML:::xmlAttrsToDataFrame(xml.file['//country/province'])
id name country capital population area
1 f0_17440 Burgenland f0_149 f0_2291 273000 3965
2 f0_17443 Carinthia f0_149 f0_2296 559000 9533
3 f0_17445 Vorarlberg f0_149 f0_2301 341000 2601
4 f0_17447 Vienna f0_149 f0_1467 1583000 415
...
XML:::xmlAttrsToDataFrame(xml.file['//country[@name="Germany"]/province'])
id name country capital population area
1 f0_17529 Baden Wurttemberg f0_220 f0_2628 10272069 35742
2 f0_17531 Bayern f0_220 f0_2712 11921944 70546
3 f0_17533 Berlin f0_220 f0_1515 3472009 889
4 f0_17534 Brandenburg f0_220 f0_2634 2536747 29480
...