刮"字符串"代码关闭URL并使用r中的rvest放入向量

时间:2016-07-05 20:21:35

标签: r rvest

我是r和rvest的新手。两天前我得到了这个代码的帮助,它刮掉了所有的玩家名字,而且效果很好。现在我正在尝试添加代码以实现功能" fetch_current_players"它还为该网站创建了一个玩家代码矢量(取自网址)。任何帮助都将受到赞赏,因为我花了一天时间谷歌搜索,阅读和观看试图自学的YouTube视频。谢谢!

library(rvest) 
library(purrr) # flatten/map/safely
library(dplyr) # progress bar

fetch_current_players <- function(letter){

  URL <- sprintf("http://www.baseball-reference.com/players/%s/", letter)
  pg <- read_html(URL)

  if (is.null(pg)) return(NULL)
  player_data <- html_nodes(pg, "b a")
  player_code<-html_attr(html_nodes(pg, "b a"), "href") #I'm trying to scrape the URL as well as the player name
  substring(player_code, 12, 20) #Strips the code out of the URL
  html_text(player_data)
  player_code #Not sure how to create vector of all codes from all 27 webpages
}

pb <- progress_estimated(length(letters))
player_list <- flatten_chr(map(letters, function(x) {
  pb$tick()$print()
  fetch_current_players(x)
}))

1 个答案:

答案 0 :(得分:0)

我喜欢保持这种简单易读的东西,for循环没有错。此代码在简单的数据框中返回名称和代码。

library(rvest) 
library(purrr) # flatten/map/safely
library(dplyr) # progress bar

fetch_current_players <- function(letter){
  URL <- sprintf("http://www.baseball-reference.com/players/%s/", letter)
  pg <- read_html(URL)

  if (is.null(pg)) return(NULL)
  player_data <- html_nodes(pg, "b a")
  player_code<-html_attr(html_nodes(pg, "b a"), "href") #I'm trying to scrape the URL as well as the player name
  player_code <- substring(player_code, 12, 20) #Strips the code out of the URL
  player_names <- html_text(player_data)
  return(data.frame(code=player_code,name=player_names))
}

pb <- progress_estimated(length(letters))

for (x in letters) {
  pb$tick()$print()
  if(exists("player_list"))
     {player_list <- rbind(player_list,fetch_current_players(x))
  } else player_list <- fetch_current_players(x)    
}