查找每个id的最后一个事务

时间:2012-08-27 02:47:01

标签: r

我有以下数据框:

id<-c(1,1,1,1,1,3,3,3,3)
period<-c("calib","calib","calib","valid","valid","calib","calib","calib","valid")
date<-c("11-11-07","11-11-07","23-11-07","12-12-08","17-12-08","11-11-07","23-11-07","23-11-07","16-01-08")
time<-c(12,13,14,11,23,15,12,18,14)
df<-data.frame(id,period,time,date)
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y")


id period time     date      date2
 1  calib   12 11-11-07 2007-11-11
 1  calib   13 11-11-07 2007-11-11
 1  calib   14 23-11-07 2007-11-23
 1  valid   11 12-12-08 2008-12-12
 1  valid   23 17-12-08 2008-12-17
 3  calib   15 11-11-07 2007-11-11
 3  calib   12 23-11-07 2007-11-23
 3  calib   18 23-11-07 2007-11-23
 3  valid   14 16-01-08 2008-01-16

我需要在date期间为每个calib提取最后一笔交易的id,并将其放入新列中。如果在一天内完成了两笔交易(类似date),则应根据交易时间选择最后一笔交易。 我要找的决赛桌如下:

id period time     date      date2  last
 1  calib   12 11-11-07 2007-11-11   NA
 1  calib   13 11-11-07 2007-11-11   NA
 1  calib   14 23-11-07 2007-11-23 2007-11-23
 1  valid   11 12-12-08 2008-12-12   NA
 1  valid   23 17-12-08 2008-12-17   NA 
 3  calib   15 11-11-07 2007-11-11   NA
 3  calib   12 23-11-07 2007-11-23   NA
 3  calib   18 23-11-07 2007-11-23 2007-11-23
 3  valid   14 16-01-08 2008-01-16   NA

有人可以帮帮我吗?!

1 个答案:

答案 0 :(得分:1)

我可以通过rle来解决问题:

L1 <- lapply(split(df, df[, "id"]), function(dat){
    dat[, "last"] <- as.Date(NA)
    x <- rle(as.character(dat[, "period"]))
    z <- cumsum(x[["lengths"]])
    dat$last[z[x[["values"]] == "calib"]] <- dat[z[x[["values"]] == "calib"] , 
        "date2"]
    dat
})

data.frame(do.call(rbind, L1), row.names = NULL)