我有两个dataframes
。第一个(games
)显示了几场比赛中的每一场的年份以及哪个球员完成了某些未指定的目标(player1
,player2
,player3
)。第二个(rankings
显示了给定年份中每个玩家的排名。
我的目标是在games
数据框中添加一列,以指示在每个游戏中达到这些目标的所有玩家的平均排名。
可复制的示例:
set.seed(0)
players <- c("Abe", "Bob", "Chris", "John", "Jane", "Linda", "Mason", "Zoe", "NA")
years <- c(2000:2005)
season <- sample(years, 20, replace = TRUE)
player1 <- sample(players, 20, replace = TRUE)
player2 <- sample(players, 20, replace = TRUE)
player3 <- sample(players, 20, replace = TRUE)
games <- data.frame(season, player1, player2, player3, stringsAsFactors = FALSE)
rankings <- data.frame(replicate(6,sample(1:5,8,rep=TRUE)))
colnames(rankings) <- years
ranked_players <- players[-9]
rankings <- cbind(ranked_players, rankings)
games
是第一个dataframe
,显示游戏年份(season
),他是player1
,谁是player2
,谁是player3
。并非总是所有游戏的所有类别都有玩家。
rankings
是第二个dataframe
,显示给定年份中每个玩家的排名从1到5。
我想计算游戏中每个游戏分别扮演player1
,player2
和player3
的玩家的排名,并将这些排名平均。
要计算排名,我尝试了以下功能:
calc_ranking <- function(x, y) {
z <- select(filter(rankings, ranked_players==x), c(y))
z <- as.integer(z[1,1])
z
}
显然有效。现在,我必须将其应用于每年玩游戏的每个玩家。
我尝试了这个循环:
new_col <- mapply(calc_ranking, games$player1, games$season)
但是它不起作用。这给我一个错误
Error in inds_combine(.vars, ind_list) : Position must be between 0 and n
但是,即使可行,我仍应使用此解决方案重复执行3次以创建3列,每个列分别为player1
,player2
和player3
,并且然后创建我真正想要的列(3列的平均值)。我怀疑有一种更有效的方法来执行此操作而无需重复循环(假设我可以修复它)?这将非常有用,因为在我的真实数据集中,我有13个“角色”,我必须为其计算排名。
希望第二个问题比我的第一个问题更好。抱歉,我刚学习R才1周(这是我一般的编码经验)。
非常感谢!
答案 0 :(得分:1)
我的理解是games
中的每一行都是一个单独的游戏ID。因此:
season player1 player2 player3
2001 Joe Bill Jane
player season ranking
Joe 2001 1
Bill 2001 3
Jane 2001 5
该场比赛的预期答案是3。为了解决这个问题,将数据融化然后在赛季和球员姓名中合并两个data.frames
是最简单的方法。 编辑:使用dcast()
行添加了类似于@heds的输出。
library(data.table)
setDT(games)
games[, game_id := seq_len(.N), keyby = season]
molten_games <- melt(games, id.vars = c('season', 'game_id'), variable.name = 'player_number', value.name = 'player')
setDT(rankings)
molten_rankings <- melt(rankings, id.vars = 'ranked_players', variable.name = 'season', value.name = 'ranking', variable.factor = F)[, season:= as.integer(season)]
merged_dt <- molten_rankings[molten_games
, on = .(season
, ranked_players = player )
, nomatch = 0L
]
merged_dt[, mean(ranking, na.rm = T), by = .(season, game_id)]
season game_id V1
1: 2000 1 2.666667
2: 2001 2 2.000000
3: 2001 3 1.333333
4: 2001 4 4.000000
5: 2002 1 3.333333
...
#or if you want all the players and rankings
dcast(merged_dt, season + game_id ~ player_number, value.var = c('ranked_players', 'ranking')
)[, means := rowMeans(.SD), .SDcols = c('ranking_player1', 'ranking_player2', 'ranking_player3')][]
season game_id ranked_players_player1 ranked_players_player2 ranked_players_player3 ranking_player1 ranking_player2 ranking_player3 means
1: 2000 1 John Mason Zoe 2 2 4 2.666667
2: 2001 1 <NA> Zoe <NA> NA 4 NA NA
3: 2001 2 Chris Jane Linda 1 1 4 2.000000
4: 2001 3 Jane Jane John 1 1 2 1.333333
5: 2001 4 Linda Zoe Zoe 4 4 4 4.000000
...
由于您似乎使用了dplyr
,所以这是一种类似的方法,尽管我想不出办法在最后将其扩大:
library(dplyr)
library(tidyr)
long_rankings <- rankings%>%
gather(key = 'season', value = 'ranking', - ranked_players)%>%
mutate(season = as.integer(season))
long_games <- games%>%
arrange(season)%>%
group_by(season)%>%
mutate(game_id = row_number())%>%
ungroup()%>%
gather(key = 'player_number', value = 'player', -season, - game_id)
inner_join(long_rankings
,long_games
, by = c('season' = 'season'
, 'ranked_players' = 'player'))%>%
group_by(season, game_id)%>%
summarize(game_rank_ave = mean(ranking, na.rm = T))
season game_id game_rank_ave
<int> <int> <dbl>
1 2000 1 2.67
2 2001 1 4
3 2001 2 2
4 2001 3 1.33
5 2001 4 4
6 2002 1 3.33
关于生成数据,请注意cbind()
!它将对象强制转换为矩阵,矩阵只能具有一类,例如字符或数字。我解决了data.frame生成问题,以解决该问题。
使用的数据:
set.seed(0)
players <- c("Abe", "Bob", "Chris", "John", "Jane", "Linda", "Mason", "Zoe", "NA")
years <- c(2000:2005)
season <- sample(years, 20, replace = TRUE)
player1 <- sample(players, 20, replace = TRUE)
player2 <- sample(players, 20, replace = TRUE)
player3 <- sample(players, 20, replace = TRUE)
games <- data.frame(season, player1, player2, player3, stringsAsFactors = FALSE)
rankings <- data.frame(replicate(6,sample(1:5,8,rep=TRUE)))
colnames(rankings) <- years
ranked_players <- players[-9]
#rankings <- cbind(ranked_players, rankings) ##don't cbind unless you're making a matrix!
rankings$ranked_players <- players[-9]
答案 1 :(得分:1)
我对您期望的输出的解释与@Cole有所不同,因为您希望每场比赛的玩家均值列。我的策略是将播放器提取到自己的数据框中(同时必须将列名更改为非数字)。然后,对于player1
,player2
和player3
中的每一个,在该“位置”中寻找玩家,并从其各自的数据框中查找其排名。我敢肯定,有更好的方法可以做到这一点,但在这种情况下还是可行的(尽管我不确定这正是您的追求。
输出:
> head(games,10)
season player1 player2 player3 player1_rank player2_rank player3_rank means
1 year2005 Mason John John 3 3 3 3.000000
2 year2001 <NA> Zoe <NA> NA 4 NA NA
3 year2002 Bob Linda Chris 3 4 3 3.333333
4 year2003 Linda Zoe Jane 5 5 3 4.333333
5 year2005 Bob Jane Chris 5 1 3 3.000000
6 year2001 Chris Jane Linda 1 1 4 2.000000
7 year2005 John Zoe Chris 3 3 3 3.000000
8 year2005 Abe Abe Jane 4 4 1 3.000000
9 year2003 John Jane Mason 1 3 3 2.333333
10 year2003 Zoe Mason Abe 5 3 5 4.333333
代码:
for (player in players){
temp_player <- filter(rankings, ranked_players == player)
colnames(temp_player) <- c("player","year2000","year2001","year2002","year2003","year2004","year2005")
assign(paste(player), temp_player)
}
games$season <- paste0("year",games$season)
games[games=="NA"] <- NA
i <- 1
for (rows in games$player1){
if (!is.na(games$player1[i])) {
season <- games$season[i]
games$player1_rank[i] <- get(games$player1[i])[,season]
}
else
{
games$player1_rank[i] <- NA
}
i <- i + 1
}
i <- 1
for (rows in games$player2){
if (!is.na(games$player2[i])) {
season <- games$season[i]
games$player2_rank[i] <- get(games$player2[i])[,season]
}
else
{
games$player2_rank[i] <- NA
}
i <- i + 1
}
i <- 1
for (rows in games$player3){
if (!is.na(games$player3[i])) {
season <- games$season[i]
games$player3_rank[i] <- get(games$player3[i])[,season]
}
else
{
games$player3_rank[i] <- NA
}
i <- i + 1
}
games$means <- rowMeans(games[,5:7]
数据:
set.seed(0)
players <- c("Abe", "Bob", "Chris", "John", "Jane", "Linda", "Mason", "Zoe", "NA")
years <- c(2000:2005)
season <- sample(years, 20, replace = TRUE)
player1 <- sample(players, 20, replace = TRUE)
player2 <- sample(players, 20, replace = TRUE)
player3 <- sample(players, 20, replace = TRUE)
games <- data.frame(season, player1, player2, player3, stringsAsFactors = FALSE)
rankings <- data.frame(replicate(6,sample(1:5,8,rep=TRUE)))
colnames(rankings) <- years
ranked_players <- players[-9]
rankings <- cbind(ranked_players, rankings)