用R阅读新闻

时间:2014-06-06 19:51:28

标签: xml r dataframe

我有以下代码:

install.packages("XML")
library(XML)
install.packages("plyr")
library(plyr)

feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml"
data <- ldply(xmlToList(feed), data.frame)

但是,它给了我以下错误:

Error in data.frame(title = "Reuters: World News", link =
"http://www.reuters.com",  :    arguments imply differing number of
rows: 1, 3, 2

为什么我无法加载此XML(但我可以加载其他XML,例如www.w3schools.com/XQuery/books.xml)?

2 个答案:

答案 0 :(得分:3)

还有一个函数xmlToDataFrame

library(XML)
feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml"
(data <- xmlToDataFrame(xmlParse(feed)["/rss/channel/item"]))
# dplyr::glimpse(data)
# Variables:
#   $ title       (fctr) More than 60 migrants drown in boat sinking off Yemen:...
# $ link        (fctr) http://feeds.reuters.com/~r/Reuters/worldNews/~3/p08tv...
# $ description (fctr) GENEVA (Reuters) - At least 60 African migrants and tw...
# $ category    (fctr) worldNews, worldNews, worldNews, worldNews, worldNews,...
# $ pubDate     (fctr) Fri, 06 Jun 2014 19:18:12 GMT, Fri, 06 Jun 2014 19:01:...
# $ guid        (fctr) http://www.reuters.com/article/2014/06/06/us-yemen-mig...
# $ origLink    (fctr) http://reuters.us.feedsportal.com/c/35217/f/654198/s/3...

答案 1 :(得分:2)

我猜你只想要结果中所有“item”节点的data.frames。如果是这样的话,那么

feed <- "http://feeds.reuters.com/Reuters/worldNews?format=xml"
reuters<-xmlToList(feed)
lapply(reuters[[1]][names(reuters[[1]])=="item"], data.frame)

应该这样做。