我有以下代码:
install.packages("XML")
library(XML)
install.packages("plyr")
library(plyr)
feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml"
data <- ldply(xmlToList(feed), data.frame)
但是,它给了我以下错误:
Error in data.frame(title = "Reuters: World News", link =
"http://www.reuters.com", : arguments imply differing number of
rows: 1, 3, 2
为什么我无法加载此XML(但我可以加载其他XML,例如www.w3schools.com/XQuery/books.xml)?
答案 0 :(得分:3)
还有一个函数xmlToDataFrame
:
library(XML)
feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml"
(data <- xmlToDataFrame(xmlParse(feed)["/rss/channel/item"]))
# dplyr::glimpse(data)
# Variables:
# $ title (fctr) More than 60 migrants drown in boat sinking off Yemen:...
# $ link (fctr) http://feeds.reuters.com/~r/Reuters/worldNews/~3/p08tv...
# $ description (fctr) GENEVA (Reuters) - At least 60 African migrants and tw...
# $ category (fctr) worldNews, worldNews, worldNews, worldNews, worldNews,...
# $ pubDate (fctr) Fri, 06 Jun 2014 19:18:12 GMT, Fri, 06 Jun 2014 19:01:...
# $ guid (fctr) http://www.reuters.com/article/2014/06/06/us-yemen-mig...
# $ origLink (fctr) http://reuters.us.feedsportal.com/c/35217/f/654198/s/3...
答案 1 :(得分:2)
我猜你只想要结果中所有“item”节点的data.frames。如果是这样的话,那么
feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml"
reuters<-xmlToList(feed)
lapply(reuters[[1]][names(reuters[[1]])=="item"], data.frame)
应该这样做。