我有以下数据:
df <- data.frame(week = rep(seq(1, 4, by=1), times = 3) )
week
1 1
2 2
3 3
4 4
5 1
6 2
7 3
8 4
9 1
10 2
11 3
12 4
我想用字母标记每个1:4的连续运行,以便结果如下:
week episode
1 1 a
2 2 a
3 3 a
4 4 a
5 1 b
6 2 b
7 3 b
8 4 b
9 1 c
10 2 c
11 3 c
12 4 c
我已经尝试了以下方法,但这不能区分序列1:4的单独连续运行
data.frame(df, episode = letters[cumsum(c(1L, diff(df$week) > 1L))])
week episode
1 1 a
2 2 a
3 3 a
4 4 a
5 1 a
6 2 a
7 3 a
8 4 a
9 1 a
10 2 a
11 3 a
12 4 a
答案 0 :(得分:1)
如果它已经在一个序列中,则只需进行逻辑矢量(week == 1
)的累积
library(dplyr)
df %>%
mutate(episode = letters[cumsum(week == 1)])
# week episode
#1 1 a
#2 2 a
#3 3 a
#4 4 a
#5 1 b
#6 2 b
#7 3 b
#8 4 b
#9 1 c
#10 2 c
#11 3 c
#12 4 c
或使用base R
(无任何其他软件包)
df$episode <- letters[cumsum(df$week == 1)]
答案 1 :(得分:1)
另一种dplyr
可能性是:
df %>%
mutate(episode = letters[gl(n()/4, 4)])
week episode
1 1 a
2 2 a
3 3 a
4 4 a
5 1 b
6 2 b
7 3 b
8 4 b
9 1 c
10 2 c
11 3 c
12 4 c
或与base R
相同:
df$episode = letters[gl(length(df$week)/4, 4)]
或者:
df %>%
mutate(episode = letters[ceiling(seq_along(week)/4)])
或与base R
相同:
df$episode = letters[ceiling(seq_along(df$week)/4)]
答案 2 :(得分:1)
您可以使用rowid
软件包中的data.table
library(data.table)
setDT(df)
df[, episode := letters[rowid(week)]]
# week episode
# 1: 1 a
# 2: 2 a
# 3: 3 a
# 4: 4 a
# 5: 1 b
# 6: 2 b
# 7: 3 b
# 8: 4 b
# 9: 1 c
# 10: 2 c
# 11: 3 c
# 12: 4 c