我遇到了在data.frame中对重复行进行编号而无法找到类似帖子的问题。
假设我们有这样的数据
df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
> df
gr x
1 1 a
2 1 a
3 2 b
4 2 b
5 3 c
6 3 c
7 4 a
8 4 a
9 5 c
10 5 c
11 6 d
12 6 d
13 7 a
14 7 a
并希望添加名为x_dupl
的新列,以显示第一次出现的x
值编号为1
,第二次出现2
,第三次出现3
等等..
提前感谢!
预期输出
> df
gr x x_dupl
1 1 a 1
2 1 a 1
3 2 b 1
4 2 b 1
5 3 c 1
6 3 c 1
7 4 a 2
8 4 a 2
9 5 c 2
10 5 c 2
11 6 d 1
12 6 d 1
13 7 a 3
14 7 a 3
答案 0 :(得分:2)
您的示例数据(加上输出中gr = 7
的行),并且命名为df1
,而不是df
:
df1 <- data.frame(gr = gl(7,2),
x = c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
library(dplyr)
df1 %>%
group_by(x) %>%
mutate(x_dupl = dense_rank(gr)) %>%
ungroup()
# A tibble: 14 x 3
gr x x_dupl
<fctr> <fctr> <int>
1 1 a 1
2 1 a 1
3 2 b 1
4 2 b 1
5 3 c 1
6 3 c 1
7 4 a 2
8 4 a 2
9 5 c 2
10 5 c 2
11 6 d 1
12 6 d 1
13 7 a 3
14 7 a 3
答案 1 :(得分:1)
基础R解决方案:
df <- data.frame(gr=gl(7,2),x=c("a","a","b","b","c","c","a","a","c","c","d","d","a","a"))
x <- rle(as.numeric(df$x))
x$values <- ave(x$values, x$values, FUN = seq_along)
df$x_dupl <- inverse.rle(x)
# gr x x_dupl
# 1 1 a 1
# 2 1 a 1
# 3 2 b 1
# 4 2 b 1
# 5 3 c 1
# 6 3 c 1
# 7 4 a 2
# 8 4 a 2
# 9 5 c 2
# 10 5 c 2
# 11 6 d 1
# 12 6 d 1
# 13 7 a 3
# 14 7 a 3