缺少数据时R停止抓取

时间:2020-09-23 04:01:25

标签: r web-scraping

我正在使用此代码循环遍历多个URL来刮取数据。该代码可以正常工作,直到出现缺少数据的日期为止。这是弹出的错误消息:

data.frame中的错误(away,home,away1H,home1H,awayPinnacle,homePinnacle): 参数暗示不同的行数:7,8

我对编码非常陌生,尽管缺少数据,也无法弄清楚如何使它保持抓取状态。

    library(rvest)
    library(dplyr)

    get_data <- function(date) {

      # Specifying URL
      url <- paste0('https://classic.sportsbookreview.com/betting-odds/nba-basketball/money-line/1st-half/?date=', date)

      # Reading the HTML code from website
      oddspage <- read_html(url)

      # Using CSS selectors to scrape away teams
      awayHtml <- html_nodes(oddspage,'.eventLine-value:nth-child(1) a')

      #Using CSS selectors to scrape 1Q scores
      away1QHtml <- html_nodes(oddspage,'.current-score+ .first')
      away1Q <- html_text(away1QHtml)
      away1Q <- as.numeric(away1Q)
      home1QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .current-score+ .period')
      home1Q <- html_text(home1QHtml)
      home1Q <- as.numeric(home1Q)

      #Using CSS selectors to scrape 2Q scores
      away2QHtml <- html_nodes(oddspage,'.first:nth-child(3)')
      away2Q <- html_text(away2QHtml)
      away2Q <- as.numeric(away2Q)
      home2QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .period:nth-child(3)')
      home2Q <- html_text(home2QHtml)
      home2Q <- as.numeric(home2Q)

      #Creating First Half Scores
      away1H <- away1Q + away2Q
      home1H <- home1Q + home2Q

      #Using CSS selectors to scrape scores
      awayScoreHtml <- html_nodes(oddspage,'.first.total')
      awayScore <- html_text(awayScoreHtml)
      awayScore <- as.numeric(awayScore)
      homeScoreHtml <- html_nodes(oddspage, '.score-periods+ .score-periods .total')
      homeScore <- html_text(homeScoreHtml)
      homeScore <- as.numeric(homeScore)

      # Converting away data to text
      away <- html_text(awayHtml)

      # Using CSS selectors to scrape home teams
      homeHtml <- html_nodes(oddspage,'.eventLine-value+ .eventLine-value a')

      # Converting home data to text
      home <- html_text(homeHtml)

      # Using CSS selectors to scrape Away Odds
      awayPinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value:nth-child(1) b')

      # Converting Away Odds to Text
      awayPinnacle <- html_text(awayPinnacleHtml)

      # Converting Away Odds to numeric
      awayPinnacle <- as.numeric(awayPinnacle)

      # Using CSS selectors to scrape Pinnacle Home Odds
      homePinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value+ .eventLine-book-value b')

      # Converting Home Odds to Text
      homePinnacle <- html_text(homePinnacleHtml)

      # Converting Home Odds to Numeric
      homePinnacle <- as.numeric(homePinnacle)

      # Create Data Frame
      df <- data.frame(away,home,away1H,home1H,awayPinnacle,homePinnacle)

    }

    date_vec <- sprintf('201902%02d', 02:06)

    all_data <- do.call(rbind, lapply(date_vec, get_data))

    View(all_data)

1 个答案:

答案 0 :(得分:2)

我建议使用purrr::map()而不是lapply。然后,您可以使用possibly()将对get_data()的呼叫包装起来,这是捕捉错误并继续前进的好方法。

library(purrr)

map_dfr(date_vec, possibly(get_data, otherwise = data.frame()))

输出:

            away         home away1H home1H awayPinnacle homePinnacle
1  L.A. Clippers      Detroit     47     65          116         -131
2      Milwaukee   Washington     73     50         -181          159
3        Chicago    Charlotte     60     51          192         -220
4       Brooklyn      Orlando     48     44          121         -137
5        Indiana        Miami     53     54          117         -133
6         Dallas    Cleveland     58     55         -159          140
7    L.A. Lakers Golden State     58     63          513         -651
8    New Orleans  San Antonio     50     63          298         -352
9         Denver    Minnesota     61     64          107         -121
10       Houston         Utah     63     50          186         -213
11       Atlanta      Phoenix     58     57          110         -125
12  Philadelphia   Sacramento     52     62         -139          123
13       Memphis     New York     42     41         -129          114
14 Oklahoma City       Boston     58     66          137         -156
15 L.A. Clippers      Toronto     51     65          228         -263
16       Atlanta   Washington     61     57          172         -196
17        Denver      Detroit     55     68         -112         -101
18     Milwaukee     Brooklyn     51     42         -211          184
19       Indiana  New Orleans     53     50         -143          127
20       Houston      Phoenix     63     57         -256          222
21   San Antonio   Sacramento     59     63         -124          110