如果URL不存在,则在循环中跳过值

时间:2016-09-06 06:34:54

标签: r web-scraping

我正在尝试获取一个代码,以获取10月份所有的NBA分数。我希望代码可以尝试每个URL,以便结合日期(27-31)和30个团队。但是,由于并非所有团队每天都在玩,有些组合不会存在,所以我试图实现try函数来跳过不存在的URL,但我似乎无法弄明白。这是我到目前为止所写的内容:

install.packages("XML")
library(XML)

teams = c('ATL','BKN','BOS','CHA','CHI',
      'CLE','DAL','DEN','DET','GS',
      'HOU','IND','LAC','LAL','MEM',
      'MIA','MIL','MIN','NOP','NYK',
      'OKC','ORL','PHI','PHX','POR',
      'SAC','SA','TOR','UTA','WSH')

october = c()

for (i in teams){
  for (j in (c(27:31))){
    url = paste("http://www.basketball-reference.com/boxscores/201510",
                     j,"0",i,".html",sep = "")
    data <- try(readHTMLTable(url, stringsAsFactors = FALSE))

    if(inherits(data, "error")) next

    away_1 = as.data.frame(data[1])
    colnames(away_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
    "FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")

    away_1 = away_1[away_1$Players != "Reserves",]
    away_1 = away_1[away_1$MP != "Did Not Play",]

    away_1$team = rep(toupper(substr(names(as.data.frame(data[1]))[1], 
                           5, 7)),length(away_1$Players))
    away_1$loc = rep(i,length(away_1$Players))

    home_1 = as.data.frame(data[3])
    colnames(home_1) = c("Players","MP","FG","FGA","FG%","3P","3PA","3P%","FT","FTA",
     "FT%", "ORB","DRB","TRB","AST","STL","BLK","TO","PF","PTS","+/-")

    home_1 = home_1[home_1$Players != "Reserves",]
    home_1 = home_1[home_1$MP != "Did Not Play",]

    home_1$team = rep(toupper(substr(names(as.data.frame(data[2]))[1], 
                            5, 7)),length(home_1$Players))
    home_1$loc = rep(i,length(home_1$Players))

    game = rbind(away_1,home_1)

    october = rbind(october, game)
  }
}

以下行上方和下方的所有内容似乎都有效:

data <- try(readHTMLTable(url, stringsAsFactors = FALSE))

if(inherits(data, "error")) next

我只需要正确格式化这两个。

2 个答案:

答案 0 :(得分:0)

如何使用tryCatch进行错误处理?

result = tryCatch({
expr
}, warning = function(w) {
warning-handler-code
}, error = function(e) {
error-handler-code
}, finally = {
cleanup-code
})

其中readHTMLTable将用作主要部分(&#39; expr&#39;)。如果出现错误/警告,您可以简单地返回缺失值,然后在最终结果上省略缺失值。

答案 1 :(得分:0)

对于任何有兴趣的人,我在RCurl中使用url.exists想出来。只需在网址定义行后点击以下内容:

if(url.exists(url) == TRUE){...}