使用R中的XML包中的xmlToDataFrame()将XML转换为数据框时出错

时间:2016-05-14 16:48:18

标签: xml r

使用此XML file (Mondial dataset)

我正在使用XML Package撰写的Duncan Temple Lang。使用xmlToDataFrame()函数将其转换为数据框时,出现此错误:

Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("\n       Albania\n     ",  : 
  duplicate subscripts for columns

如何处理?

完整代码:

    library(XML)
    xml.url <- 'http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml'
    xml.file <- xmlParse(xml.url)
    xml.df <- xmlToDataFrame(xml.file)

# or in this way, doesn't make difference too
  xml.df <- xmlToDataFrame(xml.url)

1 个答案:

答案 0 :(得分:1)

至少对于某些节点,您可以使用xmlAttrsToDataFrame。城市包括属性和值,而城市名称等某些标签可能会重复,因此您需要编写自己的函数

XML:::xmlAttrsToDataFrame(xml.file["//country"])
       id                   name capital population datacode total_area population_growth infant_mortality gdp_agri gdp_total inflation indep_date
1  f0_136                Albania f0_1461    3249136       AL      28750              1.34             49.2       55      4100        16 28 11 1912
2  f0_144                Andorra f0_1464      72766       AN        450              2.96              2.2     <NA>      1000      <NA>       <NA>
3  f0_149                Austria f0_1467    8023244       AU      83850              0.41              6.2        2    152000       2.3 12 11 1918
4  f0_157                Belarus f0_1474   10415973       BO     207600               0.2             13.4       21     49200       244 25 08 1991
...

XML:::xmlAttrsToDataFrame(xml.file['//country/province'])
        id          name country capital population  area
1 f0_17440    Burgenland  f0_149 f0_2291     273000  3965
2 f0_17443     Carinthia  f0_149 f0_2296     559000  9533
3 f0_17445    Vorarlberg  f0_149 f0_2301     341000  2601
4 f0_17447        Vienna  f0_149 f0_1467    1583000   415
...

XML:::xmlAttrsToDataFrame(xml.file['//country[@name="Germany"]/province'])
         id                   name country capital population  area
1  f0_17529      Baden Wurttemberg  f0_220 f0_2628   10272069 35742
2  f0_17531                 Bayern  f0_220 f0_2712   11921944 70546
3  f0_17533                 Berlin  f0_220 f0_1515    3472009   889
4  f0_17534            Brandenburg  f0_220 f0_2634    2536747 29480
...