我正试图从以下网站抓取MLB草案数据:
https://www.baseballamerica.com/draft-history/mlb-draft-database/#/
问题是我似乎找不到正确的类来输入rvest :: html_nodes()来隔离表。使用Chrome的“检查”工具,我尝试了看似可以识别表格的每个类:
library(tidyverse)
library(rvest)
url <- "https://www.baseballamerica.com/draft-history/mlb-draft-database/#/"
url %>%
read_html() %>%
html_nodes("table-container")
我也尝试过“ search-table draft-search-table”,但得到的结果却一直相同:“ {xml_nodeset(0)}”。任何帮助将不胜感激!
答案 0 :(得分:1)
内容从返回json的API调用动态加载。您可以对API使用httr POST请求获取信息
library(httr)
headers = c('Content-Type'='application/json')
data='{"SigningBonusMin":"0","SigningBonusMax":"0","Year":"2019","Round":"1","TeamId":"0","FourYearSchoolType":"false","JuniorCollegeType":"false","HighSchoolType":"false","OtherSchoolType":"false","OverallNumber":"0","pageId":"1","paid":"false"}'
r <- content(httr::POST(url = 'https://www.baseballamerica.com/umbraco/api/draftdatabaseapi/advancedsearch', httr::add_headers(.headers=headers), body = data, encode = "json"))$Results
print(r)