R:XML到数据框:包含子列

时间:2015-10-30 01:02:12

标签: xml r

我有一个相当简单的XML数据要在R中解析,唯一的问题是对于每个观察,一个变量在子变量中呈现,即:

<item>
  <location>Memphis, TN</location>
  <bidLevel value="Bid Level">
     <lowPrice>2.76</lowPrice>
     <highPrice>2.78</highPrice>
  </bidLevel>
</item>

这意味着简单的xmlToDataFrame不起作用:它会将两个bidLevel列折叠为一个。我找到的解决方案是转换为列表而不是df,“取消列出”每个元素,然后再次绑定它们,这当然不是很优雅(而且相当慢)。有什么更优雅的吗?

我的解决方案:

library(httr)
library(XML)


url_small <- "https://www.marketnews.usda.gov/mnp/ls-report?&endDateGrain=01/02/2001&commDetail=SOFT+RED+WINTER~NONE~US+NO+2&repMonth=1&endDateWeekly=&repType=Daily&rtype=&fsize=&_use=1&use=&_fsize=1&byproducts=&run=Run&pricutvalue=&repDateGrain=01/01/2001&runReport=true&grade=&regionsDesc=Chicago,+IL,+St.+Louis,+MO,+Cincinnati,+OH,+Toledo,+OH,+Memphis,+TN&subprimals=&mscore=&endYear=2015&repDateWeekly=&_wrange=1&endDateWeeklyGrain=&repYear=2015&loc=Chicago,+IL&loc=St.+Louis,+MO&loc=Cincinnati,+OH&loc=Toledo,+OH&loc=Memphis,+TN&_loc=1&wrange=&_grade=1&repDateWeeklyGrain=&_byproducts=1&organic=NO&category=Grain&_mscore=1&subComm=Wheat&commodity=Coarse&_commDetail=1&_subprimals=1&cut=&endMonth=1&repDate=01/01/2001&endDate=01/02/2001&format=xml"
get_2000 <- GET(URLencode(url_small))

li_i <- xmlToList(content(get_2000))
df_i <- do.call("rbind", lapply(li_i , unlist))

谢谢!

0 个答案:

没有答案