我正在尝试让R在transfermarket.com上循环播放播放器配置文件,我首先使用以下内容获取名单网址。
#/ Add the Team’s URL to scrape
TeamScrape <- read_html("http://www.transfermarkt.com/jumplist/startseite/verein/2778")
#// Get Club Name
ClubName <- TeamScrape %>%
html_nodes(".spielername-profil") %>%
html_text()
#// Get All Player URLs
PlayerURLs <- TeamScrape %>%
html_nodes(".spielprofil_tooltip") %>%
html_attr("href")
PlayerURLs <- unique(PlayerURLs)
PlayerURLs <- na.omit(PlayerURLs)
PlayerURLs <- paste0("http://www.transfermarkt.com", PlayerURLs)
PlayerLinks = data.frame(ClubName, PlayerURLs)
这给了我一个data.frame,包括我想通过我的下一个刮刀循环的URL - 'Player Profile Scraper'。
#/ Add the Player’s URL that you want to scrape
URLLink <- PlayerURLs[13]
PlayerTest <- read_html(URLLink)
#// Squad No
SquadNo <- PlayerTest %>%
html_nodes(".rueckennummer-profil") %>%
html_text()
#// Name
Name <- PlayerTest %>%
html_nodes(".spielername-profil") %>%
html_text()
#// Nationality
Nationality <- PlayerTest %>%
html_nodes(".flaggenrahmen+ span") %>%
html_text()
#// Club
Club <- PlayerTest %>%
html_nodes(".vereinprofil_tooltip+ .vereinprofil_tooltip") %>%
html_text()
#// Position
Position <- PlayerTest %>%
html_nodes(".list+ .list tr:nth-child(3) td") %>%
html_text()
#// DOB
DOB <- PlayerTest %>%
html_nodes(".wsnw") %>%
html_text()
#// Age
Age <- PlayerTest %>%
html_nodes(".profilheader .hide-for-small td") %>%
html_text() %>%
as.numeric()
#// Value
Value <- PlayerTest %>%
html_nodes(".marktwert a") %>%
html_text()
#// Matches Played this Season
Matches <- PlayerTest %>%
html_nodes(".hide.hide-for-small+ .zentriert") %>%
html_text() %>%
as.numeric()
#// Goals Scored this Season
Goals <- PlayerTest %>%
html_nodes("#yw1 tfoot .zentriert:nth-child(4)") %>%
html_text() %>%
as.numeric()
#// Assists Made this Season
Assists <- PlayerTest %>%
html_nodes("tfoot .zentriert:nth-child(5)") %>%
html_text() %>%
as.numeric()
#// Mins Played this Season
Minutes <- PlayerTest %>%
html_nodes("tfoot .zentriert:nth-child(7)") %>%
html_text() %>%
as.numeric()
#// Some Cleaning Up of the Data
# to_remove_SquadNo <- paste(c("#"))
# SquadNo <- gsub(to_remove_SquadNo, "", SquadNo)
# Minutes <- regmatches(Minutes, gregexpr("[[:digit:]]+", Minutes))
# as.numeric(unlist(Minutes))
#// Create the Data Frame
output = data.frame(SquadNo, Name, Nationality, Club, Position, DOB, Age, Value, Matches, Goals, Assists, Minutes)
我的目标是根据Team Scraper发出的网址循环播放器配置文件。我尝试了很多不同的循环尝试,我迷路了!非常感谢一些建议!
答案 0 :(得分:0)
替换
URLLink <- PlayerURLs[13]
通过
lapply(PlayerURLs, FUN=function(URLLink){
并在最后添加
output
})