我正在尝试抓取网站,但是我的当前代码(请求过多)不断出现HTTP错误429。
我尝试添加一个等待间隔以及一个随机的等待间隔,但这不起作用,经过大约100次观察后,我仍然收到错误消息。
noplayers <- 300 # the amount of players I want to run the loop for while testing my code
playeridtest <- playerid[1:noplayers] # assign the three IDs to a vector
playernames <- NULL
playernames$id <- playeridtest
for(i in seq_along(playeridtest)) {
scoresway <- paste("http://www.scoresway.com?sport=soccer&page=person&id=",playeridtest[i], sep="")
scoresway <- read_html(scoresway)
urlnodescorefirst <- html_node(scoresway, "dd:nth-child(2)")
urltextscorefirst <- html_text(urlnodescorefirst)
playernames$first[i] <- urltextscorefirst
urlnodescoresur <- html_node(scoresway, "dd:nth-child(4)")
urltextscoresur <- html_text(urlnodescoresur)
playernames$sur[i] <- urltextscoresur
Sys.sleep(sample(10, 1) * 0.1)
}
完整错误为:
open.connection(x,“ rb”)中的错误:HTTP错误429。
有人可以找到解决此错误的方法吗?我要抓取大约9.8万个观测值。