根据R中的上升趋势,下降趋势和范围对数据表进行分组

时间:2020-05-18 10:13:58

标签: r data.table

我有一个像这样的数据表:

timestamp           Value
05-01-2020  12:07:08    8
05-01-2020  12:36:05    9
05-01-2020  23:45:02    10.3
05-01-2020  13:44:33    11
06-01-2020  01:07:08    12.5
06-01-2020  10:23:05    11.3
06-01-2020  12:11:08    10.8
06-01-2020  22:06:12    9.7
07-01-2020  00:01:05    9.3
07-01-2020  02:17:09    8.6
07-01-2020  12:36:05    8.3
07-01-2020  12:07:08    7.8
07-01-2020  12:36:05    8.7
07-01-2020  12:36:05    9.3
08-01-2020  12:36:05    9.8
08-01-2020  12:36:05    10.4
08-01-2020  12:36:05    10.5
09-01-2020  12:36:05    10.3
09-01-2020  12:07:08    9.6
09-01-2020  12:36:05    9.1
11-01-2020  12:07:08    8.8
11-01-2020  12:36:05    8.3

我正在尝试创建三个组G1,G2,G3 G1用于值9 to 12的上升趋势(升序)-包括两个值, G2用于值12 to 9的下降趋势(降序)-包括两个值, G3用于9 and 12之间的值范围-包括两个值。

应该每天分组。

我能够得到G3。但是其他条件不起作用。

df[, G3 := rleid((value >= 9  & value <=12)  & (as.IDate(timestamp) == shift(as.Idate(timestamp),type ='lag'))]

所需的输出:

timestamp              Value    G1  G2  G3
05-01-2020  12:07:08    8        1  1   1
05-01-2020  12:36:05    9        2  1   2
05-01-2020  23:45:02    10.3     2  1   2
05-01-2020  13:44:33    11       2  1   2
06-01-2020  01:07:08    12.5     3  1   3
06-01-2020  10:23:05    11.3     4  2   4
06-01-2020  12:11:08    10.8     4  2   4
06-01-2020  22:06:12    9.7      4  2   4
07-01-2020  00:01:05    9.3      5  3   5
07-01-2020  02:17:09    8.6      6  4   6
07-01-2020  12:36:05    8.3      6  4   6
07-01-2020  12:07:08    7.8      6  4   6
07-01-2020  12:36:05    8.7      6  4   6
07-01-2020  12:36:05    9.3      7  4   7
08-01-2020  12:36:05    9.8      8  4   8
08-01-2020  12:36:05    10.4     8  4   8
08-01-2020  12:36:05    10.5     8  4   8
09-01-2020  12:36:05    10.3     9  5   8
09-01-2020  12:07:08    9.6      9  5   9
09-01-2020  12:36:05    9.1      9  5   9
11-01-2020  12:07:08    8.8      9  6   10
11-01-2020  12:36:05    8.3      9  6   10

1 个答案:

答案 0 :(得分:0)

这是一个有趣的问题的答案。

很难判断此答案是否正确,因为即使使用OP自己的规则集和OP自己的代码(例如,不考虑某天的更改),问题中发布的预期结果似乎也不一致。

library(data.table)
setDT(df)[, Date := as.IDate(timestamp, "%d-%m-%Y")]
df[, Change := nafill(c(diff(Value), NA), "locf"), by = Date]
df[, `:=`(
  G1 = rleid(Date, between(Value, 9, 12) & Change > 0),
  G2 = rleid(Date, between(Value, 9, 12) & Change < 0),
  G3 = rleid(Date, between(Value, 9, 12)),
  Date = NULL, 
  Change = NULL
)][]
              timestamp Value G1 G2 G3
 1: 05-01-2020 12:07:08   8.0  1  1  1
 2: 05-01-2020 12:36:05   9.0  2  1  2
 3: 05-01-2020 23:45:02  10.3  2  1  2
 4: 05-01-2020 13:44:33  11.0  2  1  2
 5: 06-01-2020 01:07:08  12.5  3  2  3
 6: 06-01-2020 10:23:05  11.3  3  3  4
 7: 06-01-2020 12:11:08  10.8  3  3  4
 8: 06-01-2020 22:06:12   9.7  3  3  4
 9: 07-01-2020 00:01:05   9.3  4  4  5
10: 07-01-2020 02:17:09   8.6  4  5  6
11: 07-01-2020 12:36:05   8.3  4  5  6
12: 07-01-2020 12:07:08   7.8  4  5  6
13: 07-01-2020 12:36:05   8.7  4  5  6
14: 07-01-2020 12:36:05   9.3  5  5  7
15: 08-01-2020 12:36:05   9.8  6  6  8
16: 08-01-2020 12:36:05  10.4  6  6  8
17: 08-01-2020 12:36:05  10.5  6  6  8
18: 09-01-2020 12:36:05  10.3  7  7  9
19: 09-01-2020 12:07:08   9.6  7  7  9
20: 09-01-2020 12:36:05   9.1  7  7  9
21: 11-01-2020 12:07:08   8.8  8  8 10
22: 11-01-2020 12:36:05   8.3  8  8 10
              timestamp Value G1 G2 G3

数据

library(data.table)
df <- fread(
  "timestamp  time         Value
05-01-2020  12:07:08    8
05-01-2020  12:36:05    9
05-01-2020  23:45:02    10.3
05-01-2020  13:44:33    11
06-01-2020  01:07:08    12.5
06-01-2020  10:23:05    11.3
06-01-2020  12:11:08    10.8
06-01-2020  22:06:12    9.7
07-01-2020  00:01:05    9.3
07-01-2020  02:17:09    8.6
07-01-2020  12:36:05    8.3
07-01-2020  12:07:08    7.8
07-01-2020  12:36:05    8.7
07-01-2020  12:36:05    9.3
08-01-2020  12:36:05    9.8
08-01-2020  12:36:05    10.4
08-01-2020  12:36:05    10.5
09-01-2020  12:36:05    10.3
09-01-2020  12:07:08    9.6
09-01-2020  12:36:05    9.1
11-01-2020  12:07:08    8.8
11-01-2020  12:36:05    8.3")[
  , `:=`(timestamp = paste(timestamp, time), time = NULL)]