我有这个数据框:
date <- structure(c(8664, 8808, 8819, 8899, 8995, 9002, 9006, 9025, 9054,
9054, 9060, 9064, 9125, 9232, 9254, 9301, 9322, 9338, 9356, 9357,
9364, 9369, 9369, 9370, 9372, 9372, 9376, 9376, 9376, 9388), class = "Date")
score <- c(2, 1, 1, 1, 2, 1, 2, 4, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 1, 1, 1, 2, 2, 2, 1, 1)
df <- data.frame(date, score)
我想找到每个分数的连续日期数,例如,分数1的最大连续日期是8个日期(参见第13-20行)。下面的数据帧是我需要的输出。我怎样才能实现这个输出?
# date score streak
# 1 1993-09-21 2 1
# 2 1994-02-12 1 3
# 3 1994-02-23 1 3
# 4 1994-05-14 1 3
# 5 1994-08-18 2 1
# 6 1994-08-25 1 1
# 7 1994-08-29 2 1
# 8 1994-09-17 4 1
# 9 1994-10-16 2 1
# 10 1994-10-16 2 1
# 11 1994-10-22 1 1
# 12 1994-10-26 2 1
# 13 1994-12-26 1 8
# 14 1995-04-12 1 8
# 15 1995-05-04 1 8
# 16 1995-06-20 1 8
# 17 1995-07-11 1 8
# 18 1995-07-27 1 8
# 19 1995-08-14 1 8
# 20 1995-08-15 1 8
# 21 1995-08-22 2 2
# 22 1995-08-27 2 2
# 23 1995-08-27 1 3
# 24 1995-08-28 1 3
# 25 1995-08-30 1 3
# 26 1995-08-30 2 3
# 27 1995-09-03 2 3
# 28 1995-09-03 2 3
# 29 1995-09-03 1 2
# 30 1995-09-15 1 2
答案 0 :(得分:3)
我们可以使用基础R rle
并重复其length
部分length
次。
x <- rle(df$score)
df$streak <- rep(x$lengths, x$lengths)
df$streak
#[1] 1 3 3 3 1 1 1 1 2 2 1 1 8 8 8 8 8 8 8 8 2 2 3 3 3 3 3 3 2 2
其中x
返回其重复的values
和length
x
#Run Length Encoding
#lengths: int [1:14] 1 3 1 1 1 1 2 1 1 8 ...
#values : num [1:14] 2 1 2 1 2 4 2 1 2 1 ...
答案 1 :(得分:1)
以下是使用rleid
data.table
的选项
library(data.table)
setDT(df)[, streak := .N, rleid(score)]
df
# date score streak
# 1: 1993-09-21 2 1
# 2: 1994-02-12 1 3
# 3: 1994-02-23 1 3
# 4: 1994-05-14 1 3
# 5: 1994-08-18 2 1
# 6: 1994-08-25 1 1
# 7: 1994-08-29 2 1
# 8: 1994-09-17 4 1
# 9: 1994-10-16 2 2
#10: 1994-10-16 2 2
#11: 1994-10-22 1 1
#12: 1994-10-26 2 1
#13: 1994-12-26 1 8
#14: 1995-04-12 1 8
#15: 1995-05-04 1 8
#16: 1995-06-20 1 8
#17: 1995-07-11 1 8
#18: 1995-07-27 1 8
#19: 1995-08-14 1 8
#20: 1995-08-15 1 8
#21: 1995-08-22 2 2
#22: 1995-08-27 2 2
#23: 1995-08-27 1 3
#24: 1995-08-28 1 3
#25: 1995-08-30 1 3
#26: 1995-08-30 2 3
#27: 1995-09-03 2 3
#28: 1995-09-03 2 3
#29: 1995-09-03 1 2
#30: 1995-09-15 1 2