Rvest输出返回“Character(0)”而不是使用selectorgadget突出显示的列

时间:2015-06-25 15:37:42

标签: r web-scraping rvest

我正在尝试使用rvest从盖茨基金会奖励拨款表中删除一些列。以下是我的代码:

library(rvest)    
data1 <- html('http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/program=US%20Program&year=2015')
table1 <- data1 %>%html_nodes('td:nth-child(5) , td:nth-child(3)') %>% html_text()
table1

我从“table1”命令收到的输出如下:

  

字符(0)

我正在使用的css选择器有问题吗?这种类型的表与rvest不兼容吗?

1 个答案:

答案 0 :(得分:2)

Here is the sample code for last two columns using RSelenium (you need to have phantomjs driver in your working directory for the following code to run). See here for details: library(RSelenium) library(rvest) pJS <- phantom() remDr <- remoteDriver(browserName = "phantomjs") remDr$open(silent = FALSE) remDr$navigate("http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/program=US%20Program&year=2015") test.html <- read_html(remDr$getPageSource()[[1]]) #html is deprecated in new version of rvest test.text<-test.html%>% html_nodes("td:nth-child(5) , td:nth-child(3)")%>% html_text() test.df<-data.frame(matrix(test.text,ncol=2,byrow=TRUE)) names(test.df)<-c("program","amount") remDr$close() pJS$stop() df test.df program amount 1 Postsecondary Success $498,727 2 Community Grants $200,000 3 Global Policy & Advocacy $1,035,523 4 Postsecondary Success $95,000 5 Postsecondary Success $25,000 6 College-Ready $1,257,526 7 College-Ready $1,066,403 8 Strategic Partnerships $50,000 9 College-Ready $1,195,581 10 College-Ready $300,000 11 College-Ready $100,000 12 College-Ready $21,200