如何从WebSite读取数据到R

时间:2015-03-27 20:44:35

标签: r

我在阅读以下链接的大学时遇到了麻烦。

http://www.usnews.com/education/best-global-universities/rankings

我试过

readHTMLTable("http://www.usnews.com/education/best-global-universities/rankings") 

....但它不起作用。

我只需要将页面中间的大学排名读入R。

1 个答案:

答案 0 :(得分:2)

作为首发者:

library(XML)
doc <- htmlParse("http://www.usnews.com/education/best-global-universities/rankings")
res <- xpathApply(doc, "//div[@class='sep']", getChildrenStrings)
data.frame(uni = gsub("\\s\\s+", " ", gsub("[\n\t\r]", "", sapply(res, "[", 6))), 
           score = as.numeric(gsub("[^0-9.]", "", sapply(res, "[", 2))))
#                                                                               uni score
# 1                      Harvard University United States Cambridge, Massachusetts  100.0
# 2   Massachusetts Institute of Technology United States Cambridge, Massachusetts   88.9
# 3          University of California--Berkeley United States Berkeley, California   88.0
# 4                         Stanford University United States Stanford, California   85.1
# 5                                     University of Oxford United Kingdom Oxford   83.6
# 6                               University of Cambridge United Kingdom Cambridge   83.3
# 7          California Institute of Technology United States Pasadena, California   80.3
# 8    University of California--Los Angeles United States Los Angeles, California   80.1
# 9                          University of Chicago United States Chicago, Illinois   77.4
# 10                          Columbia University United States New York, New York   77.3