R中的循环刮痧与rvest - 足球统计

时间:2016-02-05 00:08:28

标签: r rvest

我正在尝试让R在transfermarket.com上循环播放播放器配置文件,我首先使用以下内容获取名单网址。

#/ Add the Team’s URL to scrape

TeamScrape <- read_html("http://www.transfermarkt.com/jumplist/startseite/verein/2778")


#// Get Club Name

ClubName <- TeamScrape %>%
html_nodes(".spielername-profil") %>%
html_text()

#// Get All Player URLs 

PlayerURLs <- TeamScrape %>%
html_nodes(".spielprofil_tooltip") %>%
html_attr("href")

PlayerURLs <- unique(PlayerURLs)
PlayerURLs <- na.omit(PlayerURLs)

PlayerURLs <- paste0("http://www.transfermarkt.com", PlayerURLs)

PlayerLinks = data.frame(ClubName, PlayerURLs)

这给了我一个data.frame,包括我想通过我的下一个刮刀循环的URL - 'Player Profile Scraper'。

#/ Add the Player’s URL that you want to scrape
URLLink <- PlayerURLs[13]
PlayerTest <- read_html(URLLink)


#// Squad No 

SquadNo <- PlayerTest %>%
html_nodes(".rueckennummer-profil") %>%
html_text()


#// Name 

Name <- PlayerTest %>%
html_nodes(".spielername-profil") %>%
html_text() 

#// Nationality 

Nationality <- PlayerTest %>%
html_nodes(".flaggenrahmen+ span") %>%
html_text() 

#// Club 

Club <- PlayerTest %>%
html_nodes(".vereinprofil_tooltip+ .vereinprofil_tooltip") %>%
html_text() 

#// Position 

Position <- PlayerTest %>%
html_nodes(".list+ .list tr:nth-child(3) td") %>%
html_text()

#// DOB

DOB <- PlayerTest %>%
html_nodes(".wsnw") %>%
html_text()

#// Age 

Age <- PlayerTest %>%
html_nodes(".profilheader .hide-for-small td") %>%
html_text() %>%
as.numeric()

#// Value 

Value <- PlayerTest %>%
html_nodes(".marktwert a") %>%
html_text()

#// Matches Played this Season

Matches <- PlayerTest %>%
html_nodes(".hide.hide-for-small+ .zentriert") %>%
html_text() %>%
as.numeric()

#// Goals Scored this Season

Goals <- PlayerTest %>%
html_nodes("#yw1 tfoot .zentriert:nth-child(4)") %>%
html_text() %>%
as.numeric()

#// Assists Made this Season

Assists <- PlayerTest %>%
html_nodes("tfoot .zentriert:nth-child(5)") %>%
html_text() %>%
as.numeric()

#// Mins Played this Season

Minutes <- PlayerTest %>%
html_nodes("tfoot .zentriert:nth-child(7)") %>%
html_text() %>%
as.numeric()

#// Some Cleaning Up of the Data 

# to_remove_SquadNo <- paste(c("#"))
# SquadNo <- gsub(to_remove_SquadNo, "", SquadNo)

# Minutes <- regmatches(Minutes, gregexpr("[[:digit:]]+", Minutes))
# as.numeric(unlist(Minutes))

#// Create the Data Frame 

output = data.frame(SquadNo, Name, Nationality, Club, Position, DOB, Age, Value, Matches, Goals, Assists, Minutes)

我的目标是根据Team Scraper发出的网址循环播放器配置文件。我尝试了很多不同的循环尝试,我迷路了!非常感谢一些建议!

1 个答案:

答案 0 :(得分:0)

替换

URLLink <- PlayerURLs[13]

通过

lapply(PlayerURLs, FUN=function(URLLink){

并在最后添加

output
})