我有一个很大的时间序列(数据框格式)(n => 6000),如下所示:
time, precip
1 2005-09-30 11:45:00, 0.08
2 2005-09-30 23:45:00, 0.72
3 2005-10-01 11:45:00, 0.01
4 2005-10-01 23:45:00, 0.08
5 2005-10-02 11:45:00, 0.10
6 2005-10-02 23:45:00, 0.33
7 2005-10-03 11:45:00, 0.15
8 2005-10-03 23:45:00, 0.30
9 2005-10-04 11:45:00, 0.00
10 2005-10-04 23:45:00, 0.00
11 2005-10-05 11:45:00, 0.02
12 2005-10-05 23:45:00, 0.00
13 2005-10-06 11:45:00, 0.00
14 2005-10-06 23:45:00, 0.01
15 2005-10-07 11:45:00, 0.00
16 2005-10-07 23:45:00, 0.00
17 2005-10-08 11:45:00, 0.00
18 2005-10-08 23:45:00, 0.16
19 2005-10-09 11:45:00, 0.03
20 2005-10-09 23:45:00, 0.00
每行有一个时间(YYYY-MM-DD HH:MM:SS,12小时时间序列)和降水量。我想通过风暴事件来分离数据。
我想做的是: 1)添加一个名为“storm”的新列 2)对于由0分隔的每组金额值,称之为风暴。
例如......
Time, Precip, Storm
1 2005-09-30 11:45:00, 0.08, 1
2 2005-09-30 23:45:00, 0.72, 1
3 2005-10-01 11:45:00, 0.01, 1
4 2005-10-01 23:45:00, 0.08, 1
5 2005-10-02 11:45:00, 0.10, 1
6 2005-10-02 23:45:00, 0.33, 1
7 2005-10-03 11:45:00, 0.15, 1
8 2005-10-03 23:45:00, 0.30, 1
9 2005-10-04 11:45:00, 0.00
10 2005-10-04 23:45:00, 0.00
11 2005-10-05 11:45:00, 0.02, 2
12 2005-10-05 23:45:00, 0.00
13 2005-10-06 11:45:00, 0.00
14 2005-10-06 23:45:00, 0.01, 3
15 2005-10-07 11:45:00, 0.00
16 2005-10-07 23:45:00, 0.00
17 2005-10-08 11:45:00, 0.00
18 2005-10-08 23:45:00, 0.16, 4
19 2005-10-09 11:45:00, 0.03, 4
20 2005-10-09 23:45:00, 0.00
4)之后,我的计划是通过风暴事件对数据进行分组。
我对R很新,所以不要害怕指出明显的。非常感谢您的帮助!
答案 0 :(得分:4)
您可以在风暴中找到事件,然后使用rle
并修改结果
# assuming your data is called rainfall
# identify whether a precipitation has been recorded at each timepoint
rainfall$storm <- rainfall$precip > 0
# do run length encoding on this storm indicator
storms < rle(rainfall$storms)
# set the FALSE values to NA
is.na(storms$values) <- !storms$values
# replace the TRUE values with a number in seqence
storms$values[which(storms$values)] <- seq_len(sum(storms$values, na.rm = TRUE))
# use inverse.rle to revert to the full length column
rainfall$stormNumber <- inverse.rle(storms)
答案 1 :(得分:2)
假设这个输入:
Lines <- "time, precip
1 2005-09-30 11:45:00, 0.08
2 2005-09-30 23:45:00, 0.72
3 2005-10-01 11:45:00, 0.01
4 2005-10-01 23:45:00, 0.08
5 2005-10-02 11:45:00, 0.10
6 2005-10-02 23:45:00, 0.33
7 2005-10-03 11:45:00, 0.15
8 2005-10-03 23:45:00, 0.30
9 2005-10-04 11:45:00, 0.00
10 2005-10-04 23:45:00, 0.00
11 2005-10-05 11:45:00, 0.02
12 2005-10-05 23:45:00, 0.00
13 2005-10-06 11:45:00, 0.00
14 2005-10-06 23:45:00, 0.01
15 2005-10-07 11:45:00, 0.00
16 2005-10-07 23:45:00, 0.00
17 2005-10-08 11:45:00, 0.00
18 2005-10-08 23:45:00, 0.16
19 2005-10-09 11:45:00, 0.03
20 2005-10-09 23:45:00, 0.00
"
我们读入数据,然后为先前值为零的每个非零沉降创建一个TRUE的逻辑向量。如果z[1]
非零,我们将前置第一个值为TRUE,如果为零则为FALSE。将cumsum
应用于此向量可在与非零precip
值对应的位置中提供正确的值。要处理其位置对应于零precip
值的值,我们使用replace
将empty
存储到其中:
# read in data
library(zoo)
z <- read.zoo(text = Lines, skip = 1, tz = "", index = 2:3)[, 2]
# calculate
e <- NA # empty
cbind(precip = z, storm = replace(cumsum(c(z[1]!=0, z!=0 & lag(z,-1)==0)), z==0, e))
最后一行给出了这个:
precip storm
2005-09-30 11:45:00 0.08 1
2005-09-30 23:45:00 0.72 1
2005-10-01 11:45:00 0.01 1
2005-10-01 23:45:00 0.08 1
2005-10-02 11:45:00 0.10 1
2005-10-02 23:45:00 0.33 1
2005-10-03 11:45:00 0.15 1
2005-10-03 23:45:00 0.30 1
2005-10-04 11:45:00 0.00 NA
2005-10-04 23:45:00 0.00 NA
2005-10-05 11:45:00 0.02 2
2005-10-05 23:45:00 0.00 NA
2005-10-06 11:45:00 0.00 NA
2005-10-06 23:45:00 0.01 3
2005-10-07 11:45:00 0.00 NA
2005-10-07 23:45:00 0.00 NA
2005-10-08 11:45:00 0.00 NA
2005-10-08 23:45:00 0.16 4
2005-10-09 11:45:00 0.03 4
2005-10-09 23:45:00 0.00 NA