我将以下数据集设置为data.table(dt):
time wea
1 2016-06-16 00:04:50 cloudy
2 2016-06-16 02:14:55 cloudy
3 2016-06-16 03:19:04 cloudy
4 2016-06-16 04:19:24 cloudy
5 2016-06-16 05:23:45 cloudy
6 2016-06-16 06:25:05 cloudy
7 2016-06-16 07:26:46 cloudy
8 2016-06-16 08:27:06 cloudy
9 2016-06-16 09:34:09 sunny
10 2016-06-16 10:36:29 sunny
11 2016-06-16 11:39:09 sunny
12 2016-06-16 12:39:29 sunny
13 2016-06-16 13:44:50 rainy
14 2016-06-16 14:46:31 rainy
15 2016-06-16 15:47:12 rainy
16 2016-06-16 16:48:12 rainy
17 2016-06-16 17:49:54 sunny
18 2016-06-16 18:50:34 sunny
19 2016-06-16 19:51:34 cloudy
20 2016-06-16 20:53:55 cloudy
21 2016-06-16 21:56:51 cloudy
22 2016-06-16 22:30:42 cloudy
23 2016-06-16 23:26:14 cloudy
我想要这个:
wea start.point end.point
1: cloudy 1 8
2: sunny 9 12
3: rainy 13 16
4: sunny 17 18
5: cloudy 19 23
我尝试运行以下内容,但没有成功
dt[, .(start.point = head(.I,1), end.point = tail(.I,1)) , by="wea"]
因为“by”将第二个“阴天”区块与第一个“阴天”区块组合在一起,但我想将连续的区块分开。
答案 0 :(得分:5)
您可以使用.I
在rleid
语句中创建索引和by
来分隔相应的部分。使用:
dt[, .(start.point = .I[1], end.point = .I[.N]), by = .(wea, rleid(wea))
][, rleid := NULL][]
给出:
wea start.point end.point
1: cloudy 1 8
2: sunny 9 12
3: rainy 13 16
4: sunny 17 18
5: cloudy 19 23