修改数据:
structure(list(hour = c(0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L), cs = c(0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L
), cs_acum = c(0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 2L, 0L, 0L), cs_wanted = c(0L, 0L, 0L, 0L,
0L, 1L, 2L, 3L, 0L, 0L, 4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L,
3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 0L, 0L
), cs_acum2 = c(0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 4L, 5L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 2L, 3L, 0L, 4L, 5L, 0L, 0L)), .Names = c("hour", "cs", "cs_acum",
"cs_wanted", "cs_acum2"), class = c("data.table", "data.frame"
), row.names = c(NA, -36L), .internal.selfref = <pointer: 0x00000000001f0788>)
cs_acum
是cs
的累计和,重新开始为0.
df1$cs_acum <- with(df1, ave(df1$cs, cumsum(df1$cs == 0), FUN = cumsum))
如果hour
<1>}中1'的积累已经停止,则cs
的5行中的值为1,我需要继续此积累。<登记/>
期望的输出在col cs_wanted
中。
进一步说明:çs_acum
是符合特定条件的小时(行cs
)的累积。在此之后,它不再与cs
无关,而是与col:hour
相关。如果在停止后5小时窗口中的值为1,则应继续累积。
从hour
中的位置检查cs_acum
中的五行变为0的新函数可能是有序的,从cs_acum
中停止的位置开始累积。
/>
可能的步骤:
找到累积停止的位置
按小时查看下五行
如果值为1,则继续累计该行,
在接下来的五个小时里再看一遍,
如果没有值1,则什么也不做。
新数据:
df3 <- structure(list(hour = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
cs = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
cs_acum = c(0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13),
cs_acum2 = c(0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 8, 9, 10, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)),
.Names = c("hour", "cs", "cs_acum", "cs_acum2"), class = "data.frame", row.names = c(NA, -68L))
答案 0 :(得分:6)
使用:
library(data.table)
rl <- rle(df1$hour)
setDT(df1)[, grp := rleid(rep(rl$lengths >5 & rl$values == 0, rl$lengths))
][hour == 1, cs_acum2 := cumsum(hour), grp
][is.na(cs_acum2), cs_acum2 := 0][]
给出:
hour cs cs_acum cs_wanted grp cs_acum2
1: 1 1 1 1 1 1
2: 1 1 2 2 1 2
3: 1 1 3 3 1 3
4: 0 0 0 0 1 0
5: 0 0 0 0 1 0
6: 1 0 0 4 1 4
7: 1 0 0 5 1 5
8: 0 0 0 0 2 0
9: 0 0 0 0 2 0
10: 0 0 0 0 2 0
11: 0 0 0 0 2 0
12: 0 0 0 0 2 0
13: 0 0 0 0 2 0
14: 1 1 1 1 3 1
15: 1 1 2 2 3 2
16: 1 1 3 3 3 3
17: 0 0 0 0 3 0
说明:
setDT(df1)
将数据框转换为数据表。rl <- rle(d1$hour)
和grp := rleid(rep(rl$lengths >5 & rl$values == 0, rl$lengths))
,您可以创建一个仅在超过5个零时更改的分组变量。hour == 1
过滤,然后创建一个累计和cumsum(hour)
。如果hour
中的值仅为1
&{39}和0
,则您还可以创建一个seq_along
或{{1}的计数器这将产生相同的结果。1:.N
您将NA更改为零。 更新1:对于新的示例数据(is.na(cs_acum2), cs_acum2 := 0
):
df2
给出:
rl2 <- rle(df2$hour)
setDT(df2)[, `:=` (rn = .I, grp = rleid(rep(rl2$lengths >5 & rl2$values == 0, rl2$lengths)))
][hour == 1 & rn >= df2[, .I[cs == 1]][1], cs_acum2 := cumsum(hour), grp
][is.na(cs_acum2), cs_acum2 := 0][, c('rn','grp') := NULL][]
我理解的方式是 hour cs cs_acum cs_wanted cs_acum2
1: 0 0 0 0 0
2: 1 0 0 0 0
3: 1 0 0 0 0
4: 1 0 0 0 0
5: 0 0 0 0 0
6: 1 1 1 1 1
7: 1 1 2 2 2
8: 1 1 3 3 3
9: 0 0 0 0 0
10: 0 0 0 0 0
11: 1 0 0 4 4
12: 1 0 0 5 5
13: 0 0 0 0 0
14: 0 0 0 0 0
15: 0 0 0 0 0
16: 0 0 0 0 0
17: 0 0 0 0 0
18: 0 0 0 0 0
19: 1 1 1 1 1
20: 1 1 2 2 2
21: 1 1 3 3 3
22: 0 0 0 0 0
的{{1}}只允许在首次出现cumsum
后开始。
补充说明:
hour
创建一个rowindexnumber。cs == 1
第一次为rn = .I
提供了rownumber。df2[, .I[cs == 1]][1]
,您只选择该点以后的行。更新2:关于最新(第四个)数据集,您可以这样做:
cs == 1
给出:
rn >= df2[, .I[cs == 1]][1]
使用过的数据
第一个示例数据集:
rl4 <- rle(df4$hour)
setDT(df4)[, grp := rleid(rep(rl4$lengths >5 & rl4$values == 0, rl4$lengths))]
i1 <- df4[, .I[cs == 1][1], grp][!is.na(V1)]$V1
i2 <- df4[, .I[1:.N==5], rleid(cs)]$V1[-1] + 1
df4[i1, cs.inc := 1
][i2, cs.inc := -1
][is.na(cs.inc), cs.inc := 0
][, cs.inc := cumsum(cs.inc)
][hour == 1 & cs.inc == 1, cs_acum3 := cumsum(hour), grp
][is.na(cs_acum3), cs_acum3 := 0][, c('grp','cs.inc') := NULL][]
第二个数据集:
hour cs cs_acum cs_wanted cs_acum2 cs_acum3
1: 0 0 0 0 0 0
2: 1 0 0 0 0 0
3: 1 0 0 0 0 0
4: 1 0 0 0 0 0
5: 0 0 0 0 0 0
6: 1 1 1 1 1 1
7: 1 1 2 2 2 2
8: 1 1 3 3 3 3
9: 0 0 0 0 0 0
10: 0 0 0 0 0 0
11: 1 0 0 4 4 4
12: 1 0 0 5 5 5
13: 0 0 0 0 0 0
14: 0 0 0 0 0 0
15: 0 0 0 0 0 0
16: 0 0 0 0 0 0
17: 0 0 0 0 0 0
18: 0 0 0 0 0 0
19: 1 1 1 1 1 1
20: 1 1 2 2 2 2
21: 1 1 3 3 3 3
22: 0 0 0 0 0 0
23: 0 0 0 0 0 0
24: 0 0 0 0 0 0
25: 0 0 0 0 0 0
26: 0 0 0 0 0 0
27: 0 0 0 0 0 0
28: 0 0 0 0 0 0
29: 1 0 0 0 1 0
30: 1 0 0 0 2 0
31: 1 0 0 0 3 0
32: 0 0 0 0 0 0
33: 1 1 1 1 4 1
34: 1 1 2 2 5 2
35: 0 0 0 0 0 0
36: 0 0 0 0 0 0
第四个数据集:
df1 <- structure(list(hour = c(1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L),
cs = c(1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L),
cs_acum = c(1L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L),
cs_wanted = c(1L, 2L, 3L, 0L, 0L, 4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L)),
.Names = c("hour", "cs", "cs_acum", "cs_wanted"), class = "data.frame", row.names = c(NA, -17L))
答案 1 :(得分:1)
我们可以使用 only data.table方法
来尝试library(data.table)
setDT(df1)[, grp := shift(cumsum(hour == 1 & (Reduce(`+`,
shift(hour, 1:5, fill = 1, type = "lead"))==0)), fill=0)
][hour ==1, cs_acum1 := cumsum(hour) , grp
][is.na(cs_acum1), cs_acum1 := 0][, grp := NULL][]
# hour cs cs_acum cs_wanted cs_acum1
# 1: 1 1 1 1 1
# 2: 1 1 2 2 2
# 3: 1 1 3 3 3
# 4: 0 0 0 0 0
# 5: 0 0 0 0 0
# 6: 1 0 0 4 4
# 7: 1 0 0 5 5
# 8: 0 0 0 0 0
# 9: 0 0 0 0 0
#10: 0 0 0 0 0
#11: 0 0 0 0 0
#12: 0 0 0 0 0
#13: 0 0 0 0 0
#14: 1 1 1 1 1
#15: 1 1 2 2 2
#16: 1 1 3 3 3
#17: 0 0 0 0 0
<强>解释强>
我们转换了&#39; data.frame&#39;到&#39; data.table&#39; (setDT(df1)
),使用&#39;小时&#39;的lead
值创建分组变量。要在OP的帖子中创建条件,请指定&#39; i&#39; (hour==1
)按&#39; grp&#39;分组并指定(:=
)cumsum
小时&#39; as&#39; cs_acum1&#39;,将NA元素更改为0,最后删除&#39; grp&#39;通过将其分配给NULL