我有一个带有逐球板球数据的数据集,该数据集由比赛中的事件顺序排序(即数据帧中的“结束”)。我想添加一列,为batting_team和match_id中的每个击球手分配击球位置(1、2、3,...)。一次有两个击球手-因此他们可以击球,其他人也可以击球,然后他们回来。
我已经尝试过tally()之类的事情,但这并不能完全满足我的要求,我怀疑可能存在使用因子的潜在解决方案,但我不知道该如何按组进行。
这是一个示例数据框:
mydata <- data.frame(batting_team=c(rep("South Africa",6),rep("England",6)),
match_id=c(rep(343434,6),rep(353535,6)),
over=rep(seq(0.1,0.6,0.1),2),
batsman=c("HM Amla","HM Amla","GC Smith","HM Amla","JH Kallis","JH Kallis",
"JJ Roy","JJ Roy","JJ Roy","JM Bairstow","JM Bairstow","JJ Roy"))
这是我想要的输出:
batting_team match_id over batsman batting_order
1 South Africa 343434 0.1 HM Amla 1
2 South Africa 343434 0.2 HM Amla 1
3 South Africa 343434 0.3 GC Smith 2
4 South Africa 343434 0.4 HM Amla 1
5 South Africa 343434 0.5 JH Kallis 3
6 South Africa 343434 0.6 JH Kallis 3
7 England 353535 0.1 JJ Roy 1
8 England 353535 0.2 JJ Roy 1
9 England 353535 0.3 JJ Roy 1
10 England 353535 0.4 JM Bairstow 2
11 England 353535 0.5 JM Bairstow 2
12 England 353535 0.6 JJ Roy 1
答案 0 :(得分:2)
一种选择是按'batting_team'分组,并与match
'batsman'和'batsman'进行unique
以获得索引
library(dplyr)
mydata %>%
group_by(batting_team) %>%
mutate(batting_order = match(batsman, unique(batsman)))
# A tibble: 12 x 5
# Groups: batting_team [2]
# batting_team match_id over batsman batting_order
# <fct> <dbl> <dbl> <fct> <int>
# 1 South Africa 343434 0.1 HM Amla 1
# 2 South Africa 343434 0.2 HM Amla 1
# 3 South Africa 343434 0.3 GC Smith 2
# 4 South Africa 343434 0.4 HM Amla 1
# 5 South Africa 343434 0.5 JH Kallis 3
# 6 South Africa 343434 0.6 JH Kallis 3
# 7 England 353535 0.1 JJ Roy 1
# 8 England 353535 0.2 JJ Roy 1
# 9 England 353535 0.3 JJ Roy 1
#10 England 353535 0.4 JM Bairstow 2
#11 England 353535 0.5 JM Bairstow 2
#12 England 353535 0.6 JJ Roy 1