我有一个类似于此的数据框:
teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2
Jack Jill Matt Megan
Jill Jack Megan Matt
Megan Jill Matt Jack
Megan Matt Jill Jack
Megan Jack Jill Matt
我的目标是为每个独特的球队阵容分配一个唯一的ID,无论球员数量是多少,以及他们是在A队还是B队。对于上面的例子,我想添加以下两列我的数据框:
teamAPlayer1 teamAPlayer2 teamAID teamBPlayer1 teamBPlayer2 teamBID
Jack Jill 1 Matt Megan 2
Jill Jack 1 Megan Matt 2
Megan Jill 3 Matt Jack 4
Megan Matt 2 Jill Jack 1
Jack Matt 4 Jill Megan 3
我可以编写一个使用for / while循环索引的解决方案,但是我正在处理一个非常大的数据框架,每个团队有5个玩家而不是2个,因此脚本运行需要很长时间。有可能用矢量化方法解决这个问题吗?
答案 0 :(得分:0)
您的数据
df <- data.frame(teamAPlayer1=c("Jack","Jill","Megan","Megan","Megan"),
teamAPlayer2=c("Jill","Jack","Jill","Matt","Jack"),
teamBPlayer1=c("Matt","Megan","Matt","Jill","Jill"),
teamBPlayer2=c("Megan","Matt","Jack","Jack","Matt"),
stringsAsFactors=F)
制作独特玩家名称的载体
# Grab all unique player names - assign to each a number
unique.id <- seq(1, length(unique(unlist(df))), 1)
names(unique.id) <- unique(unlist(df))
# Paste and sort player pair combinations in new columns
df1 <- df %>%
rowwise() %>%
mutate(teamApairs=paste0(sort(c(unique.id[teamAPlayer1],unique.id[teamAPlayer2])),collapse=" ")) %>%
mutate(teamBpairs=paste0(sort(c(unique.id[teamBPlayer1],unique.id[teamBPlayer2])),collapse=" ")) %>%
制作独特玩家对的载体
# Grab all unique player pairs - assign to each a unique number
unique.pairs <- seq(1, length(unique(unlist(df1[,5:6]))), 1)
names(unique.pairs) <- unique(unlist(df1[,5:6]))
# Factorize unique player pairs as unique number
df2 <- df1 %>%
mutate(teamAID=unique.pairs[teamApairs]) %>%
mutate(teamBID=unique.pairs[teamBpairs]) %>%
select(-teamApairs,-teamBpairs)
输出
teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 teamAID teamBID
1 Jack Jill Matt Megan 1 3
2 Jill Jack Megan Matt 1 3
3 Megan Jill Matt Jack 2 5
4 Megan Matt Jill Jack 3 1
5 Megan Jack Jill Matt 4 6
答案 1 :(得分:0)
您的输出与您的输入不符(请参阅最后一行),但我认为这样可以满足您的需求:
df <- read.table(text="teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2
Jack Jill Matt Megan
Jill Jack Megan Matt
Megan Jill Matt Jack
Megan Matt Jill Jack
Megan Jack Jill Matt",stringsAsFactors=FALSE,header=TRUE)
dt_concat <- matrix(unlist(t(df)),ncol=2,byrow=TRUE) %>% # create a two column matrix with team compositions
cbind(.,team = apply(.,1,. %>% sort %>% paste(collapse=" "))) %>% as.data.table # add column with sorted team members in a string
dt_concat[, teamID := .GRP, by = team] # attribute ids
df %<>% cbind(dt_concat$teamID %>% matrix(ncol=2,byrow=TRUE) %>% set_colnames(c("teamAID","teamBID"))) # add ids to original df
# teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 teamAID teamBID
# 1 Jack Jill Matt Megan 1 2
# 2 Jill Jack Megan Matt 1 2
# 3 Megan Jill Matt Jack 3 4
# 4 Megan Matt Jill Jack 2 1
# 5 Megan Jack Jill Matt 5 6
答案 2 :(得分:0)
以下是使用pmin
和pmax
v1 <- paste(do.call(pmin, df[c(1:2)]), do.call(pmax, df[c(1:2)]))
v2 <- paste(do.call(pmin, df[c(3:4)]), do.call(pmax, df[c(3:4)]))
v3 <- unique(c(rbind(v1, v2)))
teamAID <- match(v1, v3)
#[1] 1 1 3 2 5
teamBID <- match(v2, v3)
#[1] 2 2 4 1 6
答案 3 :(得分:0)
请允许我建议您完全重塑原始数据。
library(data.table)
library(magrittr)
setDT(df)
df %>%
.[, Round := 1:.N] %>%
.[] # this is only here to view the result
teamAPlayer1 teamAPlayer2 teamBPlayer1 teamBPlayer2 Round
1: Jack Jill Matt Megan 1
2: Jill Jack Megan Matt 2
3: Megan Jill Matt Jack 3
4: Megan Matt Jill Jack 4
5: Megan Jack Jill Matt 5
也就是说,原始数据中的每一行都由Round
(锦标赛轮次)标识。然后,您可以重塑数据:
df %>%
.[, Round := 1:.N] %>%
melt.data.table(id.vars = "Round",
value.name = "participant") %>%
.[, Event := gsub("team([AB]).*$", "\\1", variable)] %>%
# Ordering by participant necessary to define
# distinct combinations JackJill == JillJack
.[order(Round, participant, Event)] %>%
.[,
.(Team = paste0(participant, collapse = "")),
keyby = .(Round, Event)]
Round Event Team
1: 1 A JackJill
2: 1 B MattMegan
3: 2 A JackJill
4: 2 B MattMegan
5: 3 A JillMegan
6: 3 B JackMatt
7: 4 A MattMegan
8: 4 B JackJill
9: 5 A JackMegan
10: 5 B JillMatt
这种格式有很多优点。例如,您可以添加另一列“分数”,它将明确地引用特定游戏,而不是依赖于列的顺序。但是,如果您想要更接近原作的内容,可以随时dcast
:
df %>%
.[, Round := 1:.N] %>%
melt.data.table(id.vars = "Round",
value.name = "participant") %>%
.[, Event := gsub("team([AB]).*$", "\\1", variable)] %>%
# Ordering by participant necessary to define
# distinct combinations JackJill == JillJack
.[order(Round, participant, Event)] %>%
.[,
.(Team = paste0(participant, collapse = "")),
keyby = .(Round, Event)] %>%
dcast.data.table(Round ~ Event)
Round A B
1: 1 JackJill MattMegan
2: 2 JackJill MattMegan
3: 3 JillMegan JackMatt
4: 4 MattMegan JackJill
5: 5 JackMegan JillMatt