我有一个这样的数据框(可重现数据框)
dt <- read.table(text = "Email Level
abc Level_6
abc Level_6
abc Level_6
abc Level_6
abc Level_6
xyz Level_5
xyz Level_5
xyz Level_2
xyz Level_2
xyz Level_3
pqr Level_1
pqr Level_4
pqr Level_5
pqr Level_5
pqr Level_1", header = T)
> dt
Email Level
1 abc Level_6
2 abc Level_6
3 abc Level_6
4 abc Level_6
5 abc Level_6
6 xyz Level_5
7 xyz Level_5
8 xyz Level_2
9 xyz Level_2
10 xyz Level_3
11 pqr Level_1
12 pqr Level_4
13 pqr Level_5
14 pqr Level_5
15 pqr Level_1
我想添加一个新列Rank,该列的排名从1开始,并且仅在每个Email id的Level列中有更改时才更改。 如果该值保持不变,则排名将继续使用先前的值
因此预期输出为
> dt_expected
Email Level Rank
1 abc Level_6 1
2 abc Level_6 1
3 abc Level_6 1
4 abc Level_6 1
5 abc Level_6 1
6 xyz Level_5 1
7 xyz Level_5 1
8 xyz Level_2 2
9 xyz Level_2 2
10 xyz Level_3 3
11 pqr Level_1 1
12 pqr Level_4 2
13 pqr Level_5 3
14 pqr Level_5 3
15 pqr Level_1 4
如何在数据表中实现这一目标?
答案 0 :(得分:1)
我们按“电子邮件”分组并获得“级别”列的运行长度ID,该列通过检查列的相邻元素来增加值
Route::post/get('/url' , 'Controller@method')->name('route_name');
或检查下一个“级别”值的值,以创建逻辑索引并获取累积总和
library(data.table)
library(dplyr)
dt %>%
group_by(Email) %>%
mutate(Rank = rleid(Level))
# A tibble: 15 x 3
# Groups: Email [3]
# Email Level Rank
# <fct> <fct> <int>
# 1 abc Level_6 1
# 2 abc Level_6 1
# 3 abc Level_6 1
# 4 abc Level_6 1
# 5 abc Level_6 1
# 6 xyz Level_5 1
# 7 xyz Level_5 1
# 8 xyz Level_2 2
# 9 xyz Level_2 2
#10 xyz Level_3 3
#11 pqr Level_1 1
#12 pqr Level_4 2
#13 pqr Level_5 3
#14 pqr Level_5 3
#15 pqr Level_1 4
或使用dt %>%
group_by(Email) %>%
mutate(Rank = 1 + cumsum(Level != lag(Level, default = first(Level))) )
data.table
或与library(data.table)
setDT(dt)[, Rank := rleid(Level), Email]
base R