Question

我想确定一次活动是否连续发生以及一周中发生一次的频率。起点是t1，它记录了活动在t1_1，t1_2，t1_3等处的发生。例如，在id 12的情况下，活动发生在t1_2，t1_3，t2_2，t3_1，t3_3，t4_2，t5_2，t6_1，t6_2，t6_3和t7_3。由于这里报告了所有7天的活动，因此我假设该活动连续发生。我想确定活动连续发生的所有ID和发生的总和。

输入

id t1_1 t1_2 t1_3 t2_1 t2_2 t2_3 t3_1 t3_2 t3_3 t4_1 t4_2 t4_3 t5_1 t5_2 t5_3 t6_1 t6_2 t6_3 t7_1 t7_2 t7_3
12  0    1     1    0     1   0    1    0    1    0    1    0    0    1    0     1   1     1   0      0  1
123 0    0     0    1     1   1    0    0    0    1    1    1    1    1    1     0   0     0    1     1  1
 10  1   1     1    1     1    1    1   1    1    1    1    1    1    1    1     1   1     1    1     1  1

输出

Id   Sum 
12    11
10    21

Answer 1

这里是rle的一个选项。循环使用apply（MARGIN = 1）中没有'id'列的数据集的行，应用rle并提取其中{values为1（' x1'）。如果'x1'的lengths为1或大于或等于7，则获取length（1是因为所有值均为1）。然后，将sum命名为stack到2列data.frame并设置列的名称（'out'）

list

数据

out <- stack(setNames(apply(df1[-1], 1, function(x) {
      x1 <- with(rle(x), lengths[as.logical(values)])
     if(length(x1) >=7|length(x1) == 1) sum(x1) }), df1$id))[2:1]
names(out) <- c('Id', 'Sum')
out
#  Id Sum
#1 12  11
#2 10  21

Answer 2

使用data.table的选项：

melt(DT, id.vars="id")[, 
    c("day", "time") := tstrsplit(variable, "_")][
        value==1L, if(all(paste0("t", 1L:7L) %chin% day)) .(Sum=sum(value)) , id]

输出：

   id Sum
1: 10  21
2: 12  11

数据：

library(data.table)
DT <- fread("id t1_1 t1_2 t1_3 t2_1 t2_2 t2_3 t3_1 t3_2 t3_3 t4_1 t4_2 t4_3 t5_1 t5_2 t5_3 t6_1 t6_2 t6_3 t7_1 t7_2 t7_3
12  0    1     1    0     1   0    1    0    1    0    1    0    0    1    0     1   1     1   0      0  1
123 0    0     0    1     1   1    0    0    0    1    1    1    1    1    1     0   0     0    1     1  1
10  1   1     1    1     1    1    1   1    1    1    1    1    1    1    1     1   1     1    1     1  1")

说明：

使用melt
使用tstrsplit将列名称拆分为星期几和时间
为value == 1L过滤，然后对于每个ID，在求和之前检查所有7天是否都在子集中（即if(all(paste0("t", 1L:7L) %chin% day)) .(Sum=sum(value))）

如何基于时间步长识别连续的观测？

2 个答案:

数据