我正在从https://rotogrinders.com/lineups/nfl?site=draftkings抓取数据。目前,我使用myData <- read_html("https://rotogrinders.com/lineups/nfl?site=draftkings")
引入数据,然后使用html_nodes
提取想要的数据。我正在尝试更改板岩选择菜单,然后获取数据。我要更改的菜单的XPath是//select[@name='slate_name']
。
我的研究使我相信我需要实现以下功能之一,但是我不确定该如何去做,因为菜单不是表单形式,也没有提交按钮...页面选择新选项后会自动重新加载:
httr::post
rvest::html_session
Rselenium
我对Rselenium
库不熟悉,因此理想情况下,我正在寻找使用httr
或rvest
的解决方案。
答案 0 :(得分:6)
您已经通过read_html()
获得了所有信息。 slate-name下拉列表仅通过java-script过滤计划。我建议您获取所有数据并自行过滤。希望有帮助。
library(magrittr)
library(rvest)
#> Lade nötiges Paket: xml2
url <- "https://rotogrinders.com/lineups/nfl?site=draftkings"
myData <- read_html(url)
myData %>%
html_nodes(".teams") %>%
html_text() %>%
stringr::str_squish()
#> [1] "New York NYJ Jets Cleveland CLE Browns"
#> [2] "New Orleans NOS Saints Atlanta ATL Falcons"
#> [3] "Buffalo BUF Bills Minnesota MIN Vikings"
#> [4] "Denver DEN Broncos Baltimore BAL Ravens"
#> [5] "Indianapolis IND Colts Philadelphia PHI Eagles"
#> [6] "Cincinnati CIN Bengals Carolina CAR Panthers"
#> [7] "San Francisco SFO 49ers Kansas City KCC Chiefs"
#> [8] "Green Bay GBP Packers Washington WAS Redskins"
#> [9] "Oakland OAK Raiders Miami MIA Dolphins"
#> [10] "New York NYG Giants Houston HOU Texans"
#> [11] "Tennessee TEN Titans Jacksonville JAC Jaguars"
#> [12] "Los Angeles LAC Chargers Los Angeles LAR Rams"
#> [13] "Chicago CHI Bears Arizona ARI Cardinals"
#> [14] "Dallas DAL Cowboys Seattle SEA Seahawks"
#> [15] "New England NEP Patriots Detroit DET Lions"
#> [16] "Pittsburgh PIT Steelers Tampa Bay TBB Buccaneers"
由reprex package(v0.2.1)于2018-09-22创建
编辑
您仍然可以通过read_html()
获得所有相关信息。您需要从下拉列表中获取ID,然后使用所有薪水解析Java脚本字符串。我做了第一部分,其余部分由您决定;-)
library(tidyverse, quietly = TRUE)
library(rvest, warn.conflicts = FALSE)
#> Lade nötiges Paket: xml2
url <- "https://rotogrinders.com/lineups/nfl?site=draftkings"
raw <- read_html(url)
# helper function
parse_json <- function(x) tibble(name = x$name, importID = x$importId)
# get id from slates
raw %>%
html_nodes(".slate-data") %>%
html_attr(name = "value") %>%
jsonlite::fromJSON() %>%
purrr::map_df(parse_json)
#> # A tibble: 10 x 2
#> name importID
#> <chr> <chr>
#> 1 1:00pm: Classic: 13 Games 21505
#> 2 8:20pm: Classic (Thu-Mon): 16 Games 21576
#> 3 1:00pm: Classic (Sun-Mon): 15 Games 21586
#> 4 1:00pm: Tiers (NFL Tiers): 14 Games 21589
#> 5 1:00pm: Classic (Early Only): 10 Games 21581
#> 6 4:05pm: Classic (Afternoon Only): 3 Games 21630
#> 7 4:25pm: Classic (Afternoon Turbo): 2 Games 21631
#> 8 8:20pm: Classic (Primetime): 2 Games 21645
#> 9 4:25pm: Showdown Captain Mode (DAL vs SEA): 1 Games 21632
#> 10 8:20pm: Showdown Captain Mode (NE vs DET): 1 Games 21644
raw %>%
html_nodes(".select") %>%
html_nodes("script") %>%
html_text() %>%
stringr::str_squish() %>%
substr(1, 1000)
#> [1] "window.slateSelect = window.createReactComponent(SlateSelectRadnor, { slates: {\"All Games\":{\"games\":[{\"scheduleId\":\"45755\",\"teamAwayId\":\"12\",\"teamHomeId\":\"3\"},{\"scheduleId\":\"45756\",\"teamAwayId\":\"23\",\"teamHomeId\":\"21\"},{\"scheduleId\":\"45757\",\"teamAwayId\":\"9\",\"teamHomeId\":\"8\"},{\"scheduleId\":\"45758\",\"teamAwayId\":\"25\",\"teamHomeId\":\"1\"},{\"scheduleId\":\"45759\",\"teamAwayId\":\"14\",\"teamHomeId\":\"19\"},{\"scheduleId\":\"45760\",\"teamAwayId\":\"2\",\"teamHomeId\":\"22\"},{\"scheduleId\":\"45761\",\"teamAwayId\":\"31\",\"teamHomeId\":\"26\"},{\"scheduleId\":\"45762\",\"teamAwayId\":\"7\",\"teamHomeId\":\"20\"},{\"scheduleId\":\"45763\",\"teamAwayId\":\"27\",\"teamHomeId\":\"10\"},{\"scheduleId\":\"45764\",\"teamAwayId\":\"18\",\"teamHomeId\":\"13\"},{\"scheduleId\":\"45765\",\"teamAwayId\":\"16\",\"teamHomeId\":\"15\"},{\"scheduleId\":\"45766\",\"teamAwayId\":\"28\",\"teamHomeId\":\"30\"},{\"scheduleId\":\"45767\",\"teamAwayId\":\"5\",\"teamHomeId\":\"29\"},{\"scheduleId\":\"45768\",\"teamAwayId\":\"17\",\"teamHomeId\":\"32\"},{\"scheduleId\":\"45769\",\"teamAwayId\":\"11\",\"teamHomeId\":\"6\"},{\"scheduleId\":\"45770\","
由reprex package(v0.2.1)于2018-09-23创建