计算另一个数据集中时间戳之前一定数量的行的均值和梯度

时间:2018-07-06 21:15:55

标签: r

我有两个数据集(db.temp,db.trans)。每个人都有时间戳和受监控的环境数据。 db.temp 的采样间隔较短(大约每分钟一分钟),而 db.trans 的采样间隔为50分钟。

现在,我想通过在db.trans中每个时间戳记之前的15分钟内(平均15点,因为可能会丢失一些数据)取平均数据来计算db.temp(db.temp $ WT)的平均值,然后将平均值作为新列附加到db.trans。我还想对db.trans中的每个时间戳使用线性回归来计算这些数据点的梯度(变化斜率),然后将其附加到db.trans。

我有一种使用if循环的方法,但是我的数据集需要一种非常有效的方法。

start <- as.POSIXct("2016-10-02 07:00:21", "%Y-%m-%d %H: %M:%S",tz="")    
end <- as.POSIXct("2016-11-06 23:00:00", "%Y-%m-%d %H:%M:%S", tz="")    
start1 <- as.POSIXct("2016-10-05 17:30:00", "%Y-%m-%d %H: %M:%S",tz='')    
end1 <- as.POSIXct("2016-11-04 20:10:00", "%Y-%m-%d %H:%M:%S", tz="")

Temp.time<- seq(start, end, by =60)    
Temp.v <-runif (length(Temp.time),min=20, max=40)    
Trans.time <- seq (start1,end1, by=3000)    
Trans.v <-sample(state.name,length(Trans.time), replace = T)    
Trans.t <- runif(length(Trans.time), min=10, max=26)

db.temp <- data.frame(Time=Temp.time, WT=Temp.v)    
db.trans <- data.frame(Time=Trans.time,state=Trans.v, outT=Trans.t)

1 个答案:

答案 0 :(得分:0)

如果我正确理解了您的问题,则data.table包中的滚动联接功能应该在这里有用:

准备数据

library(data.table)

# convert both data frames to data.table
# (this is done by reference, for greater efficiency on large datasets)
setDT(db.temp)
setDT(db.trans)

# to avoid confusion, rename time column in db.temp
setnames(db.temp, "Time", "temp.Time")

# duplicate columns to be used for rolling join
db.temp[, join_time := temp.Time]
db.trans[, join_time := Time]

# set keys for join
setkey(db.temp, join_time)
setkey(db.trans, join_time)

加入数据

# join datasets by matching each db.temp row to the first db.trans row that comes after it
db.trans2 <- db.trans[db.temp, roll = -Inf]

# calculate time difference between the time stamps
db.trans2[, diff.time := as.numeric(difftime(Time, temp.Time, units = "mins"))]

# filter out db.temp rows that had no matching db.trans row (i.e. they took place entirely 
# after the last time stamp in the db.trans dataset)
# also filter for rows where the difference between db.temp & db.trans time stamps is 
# 15 min or less
db.trans2 <- db.trans2[!is.na(Time) & diff.time <= 15]

计算

# for each db.trans time stamp, calculate the mean / gradient of db.temp WT values
db.trans3 <- db.trans2[, list(mean.WT = mean(WT),
                              slope.WT = lm(WT ~ temp.Time)$coefficients[[2]]), 
                       by = list(Time, state, outT)]