如果我从map()函数中删除Sys.sleep(),下面的代码将起作用。我试图研究该错误(“不知道如何从关闭中删除”),但在该主题上我没有发现太多。
有人知道我在哪里可以找到有关此错误的文档,以及关于为什么发生此错误以及如何防止它的任何帮助吗?
library(rvest)
library(tidyverse)
library(stringr)
# lets assume 3 pages only to do it quickly
page <- (0:18)
# no need to create a list. Just a vector
urls = paste0("https://www.mlssoccer.com/players?page=", page)
# define this function that collects the player's name from a url
get_the_names = function( url){
url %>%
read_html() %>%
html_nodes("a.name_link") %>%
html_text()
}
# map the urls to the function that gets the names
players = map(urls, get_the_names) %>%
# turn into a single character vector
unlist() %>%
# make lower case
tolower() %>%
# replace the `space` to underscore
str_replace_all(" ", "-")
# Now create a vector of player urls
player_urls = paste0("https://www.mlssoccer.com/players/", players )
# define a function that reads the 3rd table of the url
get_the_summary_stats <- function(url){
url %>%
read_html() %>%
html_nodes("table") %>%
html_table() %>% .[[3]]
}
# lets read 3 players only to speed things up [otherwise it takes a significant amount of time to run...]
a_few_players <- player_urls[1:5]
# get the stats
tables = a_few_players %>%
# important step so I can name the rows I get in the table
set_names() %>%
#map the player urls to the function that reads the 3rd table
# note the `safely` wrap around the get_the_summary_stats' function
# since there are players with no stats and causes an error (eg.brenden-aaronson )
# the output will be a list of lists [result and error]
map(., ~{ Sys.sleep(5)
safely(get_the_summary_stats) }) %>%
# collect only the `result` output (the table) INTO A DATA FRAME
# There is also an `error` output
# also, name each row with the players name
map_df("result", .id = "player") %>%
#keep only the player name (remove the www.mls.... part)
mutate(player = str_replace(player, "https://www.mlssoccer.com/players/", "")) %>%
as_tibble()
tables <- tables %>% separate(Match,c("awayTeam","homeTeam"), extra= "drop", fill = "right")
答案 0 :(得分:0)
purrr::safely(...)
返回一个函数,因此您的 map(., { Sys.sleep(5); safely(get_the_summary_stats) })
返回的是函数,而不是任何数据。在 R 中,“闭包”是一个函数及其封闭环境。
波浪号表示法是一种更简洁的匿名函数的特定于 tidyverse 的方法。通常(例如,使用 lapply)
会使用 lapply(mydata, function(x) get_the_summary_stats(x))
。在波浪符号中,相同的内容写为 map(mydata, ~ get_the_summary_stats(.))
因此,重新写入:
... %>% map(~ { Sys.sleep(5); safely(get_the_summary_stats)(.); })
来自@r2evans的评论