刮网页

时间:2018-04-07 22:31:38

标签: r web-scraping

我正在尝试抓取以下网页:http://www.qedu.org.br/escola/112633-militar-de-salvador/enem?edition=2009。在互联网上搜索,我找到了以下脚本:

fetch_attendance <- function (year) {
url <- paste0 ("http://www.qedu.org.br/escola/112633-colegio-militar-de-   salvador/enem?edition=", year)
date <- url%>%
httr :: GET ()%>%
httr :: content ('text', encoding = 'utf-8')%>%
xml2 :: read_html ()%>%
rvest :: html_nodes (xpath = '// div [@ class = "span4 score-ct"] / p [1]')%>%
html_text ()%>%
gsub ("^ \\ s + | \\ s + $ | score", "",.)
date
}

library (plyr)

res <- ldply (2009: 2016, fetch_attendance, .progress = "text")

然而,结果如下:

> res <- ldply (2009: 2016, fetch_attendance, .progress = "text")
=========================================== ===================== | 100%
> res
V1 V2 V3 V4 V5 V6
1 97% 635pts 605pts 595pts 659pts 735pts
2 97% 635pts 605pts 595pts 659pts 735pts
3 97% 635pts 605pts 595pts 659pts 735pts
4 97% 635pts 605pts 595pts 659pts 735pts
5 97% 635pts 605pts 595pts 659pts 735pts
6 97% 635pts 605pts 595pts 659pts 735pts
7 97% 635pts 605pts 595pts 659pts 735pts
8 97% 635pts 605pts 595pts 659pts 735pts

也就是说,它只返回2009年的结果。任何人都可以帮助我吗? 我还希望获得2010年至2016年的价值。

0 个答案:

没有答案