我正在尝试使用rvest从盖茨基金会奖励拨款表中删除一些列。以下是我的代码:
library(rvest)
data1 <- html('http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/program=US%20Program&year=2015')
table1 <- data1 %>%html_nodes('td:nth-child(5) , td:nth-child(3)') %>% html_text()
table1
我从“table1”命令收到的输出如下:
字符(0)
我正在使用的css选择器有问题吗?这种类型的表与rvest不兼容吗?
1 个答案:
答案 0 :(得分:2)
Here is the sample code for last two columns using RSelenium (you need to have phantomjs driver in your working directory for the following code to run). See here for details:
library(RSelenium)
library(rvest)
pJS <- phantom()
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open(silent = FALSE)
remDr$navigate("http://www.gatesfoundation.org/How-We-Work/Quick-Links/Grants-Database#q/program=US%20Program&year=2015")
test.html <- read_html(remDr$getPageSource()[[1]]) #html is deprecated in new version of rvest
test.text<-test.html%>%
html_nodes("td:nth-child(5) , td:nth-child(3)")%>%
html_text()
test.df<-data.frame(matrix(test.text,ncol=2,byrow=TRUE))
names(test.df)<-c("program","amount")
remDr$close()
pJS$stop()
df
test.df
program amount
1 Postsecondary Success $498,727
2 Community Grants $200,000
3 Global Policy & Advocacy $1,035,523
4 Postsecondary Success $95,000
5 Postsecondary Success $25,000
6 College-Ready $1,257,526
7 College-Ready $1,066,403
8 Strategic Partnerships $50,000
9 College-Ready $1,195,581
10 College-Ready $300,000
11 College-Ready $100,000
12 College-Ready $21,200