我有这个数据框(称为signal
):
Date Sig
1 2012-03-25 Go
2 2012-04-15 Stop
3 2012-04-22 Stop
4 2012-05-13 Stop
5 2012-05-20 Stop
6 2012-06-24 Go
7 2012-09-23 Go
8 2012-09-30 Go
9 2012-10-14 Stop
10 2012-12-02 Go
11 2012-12-16 Stop
我正在尝试合并/加入日期范围,以便创建类似的东西:
Start Stop Sig
1 2012-03-25 2012-04-15 Go
2 2012-04-15 2012-06-24 Stop
3 2012-06-24 2012-10-14 Go
4 2012-10-14 2012-12-02 Stop
5 2012-12-12 2012-12-16 Go
请问任何想法?
答案 0 :(得分:1)
到目前为止,这个老问题还没有得到正确答案。以下是使用data.table
函数的简明rleid()
解决方案:
library(data.table)
setDT(signal)[order(Date), .(Start = first(Date)), by = .(rleid(Sig), Sig)][
, Stop := shift(Start, type = "lead")][
-.N, !"rleid"]
Sig Start Stop 1: Go 2012-03-25 2012-04-15 2: Stop 2012-04-15 2012-06-24 3: Go 2012-06-24 2012-10-14 4: Stop 2012-10-14 2012-12-02 5: Go 2012-12-02 2012-12-16
setDT()
强制signal
到班级data.table
。然后,signal
按Date
排序,并使用Sig
和rleid()
的{{1}}连续条纹进行汇总。挑选每组的第一行。要确定停止日期,新Sig
列会向前移动。最后,删除最后一行和Start
分组变量。
OP的数据:
rleid
答案 1 :(得分:0)
我想要的方法是对片段进行排序,然后折叠具有相同值且背靠背的片段。
require(data.table)
## generating a (similar ?) data set
df <- data.frame(dates = rep(as.Date('01-01-2010','%m-%d-%Y'),20) + sample(1:100,20),
sig = sample(c('stop', 'go'), replace = T, ))
df$sig <- as.character(df$sig)
df <- df[order(df$dates),]
### creating the lag variable for date
df$dates2 <- c(NA,df$dates[1:nrow(df)-1])
### creating the lag variable for sig
df$sig2 <- c(NA,df$sig[1:nrow(df)-1])
## creating a variable that triggers a new segment
df$grp <- as.numeric(df$sig != df$sig2)
df$grp[1] <- 0
### the cumsum of the trigger is actually the grouping variable
df$grp2 <- cumsum(df$grp)
## using data table
dt <- data.table(df)
dt2 <- dt[,.(start = min(dates), end = max(dates), sig = sig ),
grp]
grp start end sig
1: 0 2010-01-05 2010-04-11 go
2: 0 2010-01-05 2010-04-11 go
3: 0 2010-01-05 2010-04-11 go
4: 0 2010-01-05 2010-04-11 stop
5: 0 2010-01-05 2010-04-11 stop
6: 0 2010-01-05 2010-04-11 go
7: 0 2010-01-05 2010-04-11 stop
8: 0 2010-01-05 2010-04-11 go