dplyr-按2列分组并计算其中一列的唯一值

时间:2018-07-30 14:19:22

标签: r dplyr

我有一个数据框

id          player  
8297682400  Player1
8297692740  Player1
8255798760  Player1
8255798760  Player1
8255798760  Player1
8255799456  Player2
8255799456  Player2
8255799456  Player2
8255866000  Player2
8255866000  Player2
8255866000  Player2
8255826600  Player1
8255826600  Player1
8255826600  Player1
8255854600  Player2
8255854700  Player1

如果我使用group_by(player,id),我知道我可以很容易地按%>% mutate(counter=1:n())来计算每组中的行数

但是我该如何计算每个玩家的唯一id值,并在发现重复项时“暂停”计数?

我想要:

id          player  id_counter
8297682400  Player1 1
8297692740  Player1 2
8255798760  Player1 3
8255798760  Player1 3
8255798760  Player1 3
8255799456  Player2 1
8255799456  Player2 1
8255799456  Player2 1
8255866000  Player2 2
8255866000  Player2 2
8255866000  Player2 2
8255826600  Player1 4
8255826600  Player1 4
8255826600  Player1 4
8255854600  Player2 3
8255854700  Player1 5

1 个答案:

答案 0 :(得分:4)

我们可以使用match

df1 %>%
   group_by(player) %>%
   mutate(id_counter = match(id, unique(id)))
# A tibble: 16 x 3
# Groups:   player [2]
#           id player  id_counter
#        <dbl> <chr>        <int>
# 1 8297682400 Player1          1
# 2 8297692740 Player1          2
# 3 8255798760 Player1          3
# 4 8255798760 Player1          3
# 5 8255798760 Player1          3
# 6 8255799456 Player2          1
# 7 8255799456 Player2          1
# 8 8255799456 Player2          1
# 9 8255866000 Player2          2
#10 8255866000 Player2          2
#11 8255866000 Player2          2
#12 8255826600 Player1          4
#13 8255826600 Player1          4
#14 8255826600 Player1          4
#15 8255854600 Player2          3
#16 8255854700 Player1          5

或者通过转换为factor并将其强制为integer

df1 %>%
   group_by(player) %>% 
   mutate(id_counter = as.integer(factor(id, levels = unique(id))))