Question

我正在尝试从https://www.basketball-reference.com/leagues/NBA_2018.html中提取表格。我想要的表是（每场比赛统计数据的团队）。该网页有多个表，当我尝试从中提取表时，会提供页面中的前两个表。

如何使用R获得想要的表？我在下面提到了我使用的代码

library(rvest)


url <- "https://www.basketball-reference.com/leagues/NBA_2018.html"

# read the link 

html <-read_html(url)


tables <- html %>% html_table(fill =TRUE)

View(tables)

Answer 1

已被注释掉。您可以使用xpath获取注释，然后获取所需的表

library(rvest)

page <- read_html('https://www.basketball-reference.com/leagues/NBA_2018.html')

df <- page %>% html_nodes(xpath = '//comment()') %>%
  html_text() %>%  
  paste(collapse = '') %>%
  read_html() %>% 
  html_node('#team-stats-per_game') %>%
  html_table()

如何从R中具有多张表格的网站中提取特定表格？

1 个答案: