RVest和选择器小工具的网络抓取问题

时间:2019-10-11 14:27:58

标签: r web-scraping tidyverse rvest

我正试图从以下网站抓取MLB草案数据:

https://www.baseballamerica.com/draft-history/mlb-draft-database/#/

问题是我似乎找不到正确的类来输入rvest :: html_nodes()来隔离表。使用Chrome的“检查”工具,我尝试了看似可以识别表格的每个类:


library(tidyverse)
library(rvest)

url <- "https://www.baseballamerica.com/draft-history/mlb-draft-database/#/"

url %>% 
  read_html() %>% 
  html_nodes("table-container")

我也尝试过“ search-table draft-search-table”,但得到的结果却一直相同:“ {xml_nodeset(0)}”。任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:1)

内容从返回json的API调用动态加载。您可以对API使用httr POST请求获取信息

library(httr)

headers = c('Content-Type'='application/json')
data='{"SigningBonusMin":"0","SigningBonusMax":"0","Year":"2019","Round":"1","TeamId":"0","FourYearSchoolType":"false","JuniorCollegeType":"false","HighSchoolType":"false","OtherSchoolType":"false","OverallNumber":"0","pageId":"1","paid":"false"}'
r <- content(httr::POST(url = 'https://www.baseballamerica.com/umbraco/api/draftdatabaseapi/advancedsearch', httr::add_headers(.headers=headers), body = data, encode = "json"))$Results
print(r)