R 网页抓取输出“字符(空)”

时间:2021-05-06 10:52:24

标签: r rvest

我是 R 的新手。

我需要帮助将网络抓取数据分配给“薪水”。不知何故,我的变量“salary”在我的环境中显示字符(空)。我已经使用 SelectorGadget 来查找 html 节点。

如果有人能向我解释一下,我将不胜感激。谢谢!

library(rvest)
library(tidyverse)
library(magrittr)

nba_player_salaries <- read_html("https://hoopshype.com/salaries/players/2018-2019/")

salary <- nba_player_salaries %>%
  html_nodes("tbody .hh-salaries-sorted") %>%
  html_text2()

1 个答案:

答案 0 :(得分:0)

可以直接从页面中提取表格:

library(rvest)
library(dplyr)

url <- 'https://hoopshype.com/salaries/players/2018-2019/'

url %>%
  read_html() %>%
  html_table() %>%
  .[[1]] %>%
  setNames(.[1, ]) %>% #Since column names are in 1st row
  slice(-1) %>%        #Remove 1st row
  select(-1)           #Remove 1st column

#   Player            `2018/19`   `2018/19(*)`
#   <chr>             <chr>       <chr>       
# 1 Stephen Curry     $37,457,154 $38,320,489 
# 2 Russell Westbrook $35,665,000 $36,487,029 
# 3 Chris Paul        $35,654,150 $36,475,929 
# 4 LeBron James      $35,654,150 $36,475,929 
# 5 Kyle Lowry        $32,700,000 $33,453,690 
# 6 Blake Griffin     $31,873,932 $32,608,582 
# 7 Gordon Hayward    $31,214,295 $31,933,741 
# 8 James Harden      $30,570,000 $31,274,596 
# 9 Paul George       $30,560,700 $31,265,082 
#10 Mike Conley       $30,521,115 $31,224,584 
# … with 566 more rows