Question

我不确定代码中缺少什么。我正在尝试将数据从https://www.espn.com/nfl/standings/_/season/2010抓取到R中的小标题。到目前为止，我的代码如下：

library(tidyverse)
library(rvest)

# url I want the data from. 
NFL_2010.url <- "https://www.espn.com/nfl/standings/_/season/2010"
# Use webscraping to import the data from the url into R
NFL_2010 <- NFL_2010.url %>%
  read_html(NFL_2010) %>%
  #There is more than 1 table, so I'm trying to use html_nodes 
  html_nodes("table") %>%
  html_table () %>%
  #convert data to a tibble
  as_tibble()

我在这里想念什么？

Answer 1

对该网页进行网络抓取将返回一个列表，其中所有表均分为4部分。因此，您必须将这些片段连接在一起，然后转换为2个小标题。例如：

library(tidyverse)
library(rvest)

NFL_2010.url <- "https://www.espn.com/nfl/standings/_/season/2010"

NFL_2010 <- NFL_2010.url %>%
  read_html() %>%
  html_nodes("table") %>%
  html_table()

# American Football Conference
NFL_2010_AFC <- bind_cols(NFL_2010[[1]], NFL_2010[[2]]) %>%
  as_tibble()

# National Football Conference
NFL_2010_NFC <- bind_cols(NFL_2010[[3]], NFL_2010[[4]]) %>%
  as_tibble()

在那之后它仍然需要清除一些数据...

我正在尝试将网站中的数据抓取到R中

1 个答案: