在dplyr中编号重复的行

时间:2017-11-08 04:20:45

标签: r dplyr

我遇到了在data.frame中对重复行进行编号而无法找到类似帖子的问题。

假设我们有这样的数据

df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))


    > df
   gr x
1   1 a
2   1 a
3   2 b
4   2 b
5   3 c
6   3 c
7   4 a
8   4 a
9   5 c
10  5 c
11  6 d
12  6 d
13  7 a
14  7 a

并希望添加名为x_dupl的新列,以显示第一次出现的x值编号为1,第二次出现2,第三次出现3等等..

提前感谢!

预期输出

 > df
           gr x x_dupl 
        1   1 a  1
        2   1 a  1
        3   2 b  1
        4   2 b  1
        5   3 c  1
        6   3 c  1
        7   4 a  2
        8   4 a  2
        9   5 c  2
        10  5 c  2
        11  6 d  1
        12  6 d  1
        13  7 a  3 
        14  7 a  3

2 个答案:

答案 0 :(得分:2)

您的示例数据(加上输出中gr = 7的行),并且命名为df1,而不是df

df1 <- data.frame(gr = gl(7,2),
                  x  = c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))

library(dplyr)
df1 %>% 
  group_by(x) %>% 
  mutate(x_dupl = dense_rank(gr)) %>%
  ungroup()

# A tibble: 14 x 3
       gr      x x_dupl
   <fctr> <fctr>  <int>
 1      1      a      1
 2      1      a      1
 3      2      b      1
 4      2      b      1
 5      3      c      1
 6      3      c      1
 7      4      a      2
 8      4      a      2
 9      5      c      2
10      5      c      2
11      6      d      1
12      6      d      1
13      7      a      3
14      7      a      3

答案 1 :(得分:1)

基础R解决方案:

df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))

x <- rle(as.numeric(df$x))
x$values <- ave(x$values, x$values, FUN = seq_along)
df$x_dupl <- inverse.rle(x)
#    gr x x_dupl
# 1   1 a      1
# 2   1 a      1
# 3   2 b      1
# 4   2 b      1
# 5   3 c      1
# 6   3 c      1
# 7   4 a      2
# 8   4 a      2
# 9   5 c      2
# 10  5 c      2
# 11  6 d      1
# 12  6 d      1
# 13  7 a      3
# 14  7 a      3