我有一个像这样的数据表:
timestamp Value
05-01-2020 12:07:08 8
05-01-2020 12:36:05 9
05-01-2020 23:45:02 10.3
05-01-2020 13:44:33 11
06-01-2020 01:07:08 12.5
06-01-2020 10:23:05 11.3
06-01-2020 12:11:08 10.8
06-01-2020 22:06:12 9.7
07-01-2020 00:01:05 9.3
07-01-2020 02:17:09 8.6
07-01-2020 12:36:05 8.3
07-01-2020 12:07:08 7.8
07-01-2020 12:36:05 8.7
07-01-2020 12:36:05 9.3
08-01-2020 12:36:05 9.8
08-01-2020 12:36:05 10.4
08-01-2020 12:36:05 10.5
09-01-2020 12:36:05 10.3
09-01-2020 12:07:08 9.6
09-01-2020 12:36:05 9.1
11-01-2020 12:07:08 8.8
11-01-2020 12:36:05 8.3
我正在尝试创建三个组G1,G2,G3
G1
用于值9 to 12
的上升趋势(升序)-包括两个值,
G2
用于值12 to 9
的下降趋势(降序)-包括两个值,
G3
用于9 and 12
之间的值范围-包括两个值。
应该每天分组。
我能够得到G3
。但是其他条件不起作用。
df[, G3 := rleid((value >= 9 & value <=12) & (as.IDate(timestamp) == shift(as.Idate(timestamp),type ='lag'))]
所需的输出:
timestamp Value G1 G2 G3
05-01-2020 12:07:08 8 1 1 1
05-01-2020 12:36:05 9 2 1 2
05-01-2020 23:45:02 10.3 2 1 2
05-01-2020 13:44:33 11 2 1 2
06-01-2020 01:07:08 12.5 3 1 3
06-01-2020 10:23:05 11.3 4 2 4
06-01-2020 12:11:08 10.8 4 2 4
06-01-2020 22:06:12 9.7 4 2 4
07-01-2020 00:01:05 9.3 5 3 5
07-01-2020 02:17:09 8.6 6 4 6
07-01-2020 12:36:05 8.3 6 4 6
07-01-2020 12:07:08 7.8 6 4 6
07-01-2020 12:36:05 8.7 6 4 6
07-01-2020 12:36:05 9.3 7 4 7
08-01-2020 12:36:05 9.8 8 4 8
08-01-2020 12:36:05 10.4 8 4 8
08-01-2020 12:36:05 10.5 8 4 8
09-01-2020 12:36:05 10.3 9 5 8
09-01-2020 12:07:08 9.6 9 5 9
09-01-2020 12:36:05 9.1 9 5 9
11-01-2020 12:07:08 8.8 9 6 10
11-01-2020 12:36:05 8.3 9 6 10
答案 0 :(得分:0)
这是一个有趣的问题的答案。
很难判断此答案是否正确,因为即使使用OP自己的规则集和OP自己的代码(例如,不考虑某天的更改),问题中发布的预期结果似乎也不一致。
library(data.table)
setDT(df)[, Date := as.IDate(timestamp, "%d-%m-%Y")]
df[, Change := nafill(c(diff(Value), NA), "locf"), by = Date]
df[, `:=`(
G1 = rleid(Date, between(Value, 9, 12) & Change > 0),
G2 = rleid(Date, between(Value, 9, 12) & Change < 0),
G3 = rleid(Date, between(Value, 9, 12)),
Date = NULL,
Change = NULL
)][]
timestamp Value G1 G2 G3 1: 05-01-2020 12:07:08 8.0 1 1 1 2: 05-01-2020 12:36:05 9.0 2 1 2 3: 05-01-2020 23:45:02 10.3 2 1 2 4: 05-01-2020 13:44:33 11.0 2 1 2 5: 06-01-2020 01:07:08 12.5 3 2 3 6: 06-01-2020 10:23:05 11.3 3 3 4 7: 06-01-2020 12:11:08 10.8 3 3 4 8: 06-01-2020 22:06:12 9.7 3 3 4 9: 07-01-2020 00:01:05 9.3 4 4 5 10: 07-01-2020 02:17:09 8.6 4 5 6 11: 07-01-2020 12:36:05 8.3 4 5 6 12: 07-01-2020 12:07:08 7.8 4 5 6 13: 07-01-2020 12:36:05 8.7 4 5 6 14: 07-01-2020 12:36:05 9.3 5 5 7 15: 08-01-2020 12:36:05 9.8 6 6 8 16: 08-01-2020 12:36:05 10.4 6 6 8 17: 08-01-2020 12:36:05 10.5 6 6 8 18: 09-01-2020 12:36:05 10.3 7 7 9 19: 09-01-2020 12:07:08 9.6 7 7 9 20: 09-01-2020 12:36:05 9.1 7 7 9 21: 11-01-2020 12:07:08 8.8 8 8 10 22: 11-01-2020 12:36:05 8.3 8 8 10 timestamp Value G1 G2 G3
library(data.table)
df <- fread(
"timestamp time Value
05-01-2020 12:07:08 8
05-01-2020 12:36:05 9
05-01-2020 23:45:02 10.3
05-01-2020 13:44:33 11
06-01-2020 01:07:08 12.5
06-01-2020 10:23:05 11.3
06-01-2020 12:11:08 10.8
06-01-2020 22:06:12 9.7
07-01-2020 00:01:05 9.3
07-01-2020 02:17:09 8.6
07-01-2020 12:36:05 8.3
07-01-2020 12:07:08 7.8
07-01-2020 12:36:05 8.7
07-01-2020 12:36:05 9.3
08-01-2020 12:36:05 9.8
08-01-2020 12:36:05 10.4
08-01-2020 12:36:05 10.5
09-01-2020 12:36:05 10.3
09-01-2020 12:07:08 9.6
09-01-2020 12:36:05 9.1
11-01-2020 12:07:08 8.8
11-01-2020 12:36:05 8.3")[
, `:=`(timestamp = paste(timestamp, time), time = NULL)]