我有一个相当简单的XML数据要在R中解析,唯一的问题是对于每个观察,一个变量在子变量中呈现,即:
<item>
<location>Memphis, TN</location>
<bidLevel value="Bid Level">
<lowPrice>2.76</lowPrice>
<highPrice>2.78</highPrice>
</bidLevel>
</item>
这意味着简单的xmlToDataFrame不起作用:它会将两个bidLevel列折叠为一个。我找到的解决方案是转换为列表而不是df,“取消列出”每个元素,然后再次绑定它们,这当然不是很优雅(而且相当慢)。有什么更优雅的吗?
我的解决方案:
library(httr)
library(XML)
url_small <- "https://www.marketnews.usda.gov/mnp/ls-report?&endDateGrain=01/02/2001&commDetail=SOFT+RED+WINTER~NONE~US+NO+2&repMonth=1&endDateWeekly=&repType=Daily&rtype=&fsize=&_use=1&use=&_fsize=1&byproducts=&run=Run&pricutvalue=&repDateGrain=01/01/2001&runReport=true&grade=®ionsDesc=Chicago,+IL,+St.+Louis,+MO,+Cincinnati,+OH,+Toledo,+OH,+Memphis,+TN&subprimals=&mscore=&endYear=2015&repDateWeekly=&_wrange=1&endDateWeeklyGrain=&repYear=2015&loc=Chicago,+IL&loc=St.+Louis,+MO&loc=Cincinnati,+OH&loc=Toledo,+OH&loc=Memphis,+TN&_loc=1&wrange=&_grade=1&repDateWeeklyGrain=&_byproducts=1&organic=NO&category=Grain&_mscore=1&subComm=Wheat&commodity=Coarse&_commDetail=1&_subprimals=1&cut=&endMonth=1&repDate=01/01/2001&endDate=01/02/2001&format=xml"
get_2000 <- GET(URLencode(url_small))
li_i <- xmlToList(content(get_2000))
df_i <- do.call("rbind", lapply(li_i , unlist))
谢谢!