我无法从网址https://www.basketball-reference.com/leagues/NBA_2020_totals.html#totals_stats::pts中读取数据。这是代码:
library(rvest)
url <- "https://www.basketball-reference.com/leagues/NBA_2020_totals.html#totals_stats::pts"
pagina <- read_html(url, as.data.frame=T, stringsAsFactors = TRUE,
encoding = "utf-8")
pagina %>%
html_nodes("table") %>%
.[[1]] %>%
html_table(fill=T) -> x
这将读取表,但我不知道为什么要粘贴这样的几行:
Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
54 Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
77 Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
102 Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
133 Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
162 Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
189 Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
218 Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
我得到了球员行,但我也得到了那些行。我不知道这些行是否也是播放器,它们读得不好,或者它们只是粘贴的随机行,因为我在代码中做错了什么。我要删除那些行(如您所见处于随机位置)或修改读取的代码,以使我不明白它们。
先谢谢了。
阿尔贝托
答案 0 :(得分:1)
您应该忽略这些行,而仅使用相关行。
library(rvest)
library(dplyr)
url <- "https://www.basketball-reference.com/leagues/NBA_2020_totals.html"
webpage <- url %>% read_html
webpage %>%
html_table() %>%
.[[1]] %>%
filter(!grepl('Rk', Rk)) %>%
type.convert(as.is = TRUE)
# Rk Player Pos Age Tm G GS MP FG FGA FG% ...
#1 1 Steven Adams C 26 OKC 58 58 1564 262 443 0.591 ...
#2 2 Bam Adebayo PF 22 MIA 65 65 2235 408 719 0.567 ...
#3 3 LaMarcus Aldridge C 34 SAS 53 53 1754 391 793 0.493 ...
#4 4 Nickeil Alexander-Walker SG 21 NOP 41 0 501 77 227 0.339 ...
#5 5 Grayson Allen SG 24 MEM 30 0 498 79 176 0.449 ...
#6 6 Jarrett Allen C 21 BRK 64 58 1647 267 413 0.646 ...
#7 7 Kadeem Allen SG 27 NYK 10 0 117 19 44 0.432 ...
#8 8 Al-Farouq Aminu PF 29 ORL 18 2 380 25 86 0.291 ...
#9 9 Justin Anderson SF 26 BRK 3 0 17 1 6 0.167 ...
#10 10 Kyle Anderson PF 26 MEM 59 20 1140 138 280 0.493 ...
#11 11 Ryan Anderson PF 31 HOU 2 0 14 2 7 0.286 ...
#...
#...