我有两个数据集(db.temp,db.trans)。每个人都有时间戳和受监控的环境数据。 db.temp 的采样间隔较短(大约每分钟一分钟),而 db.trans 的采样间隔为50分钟。
现在,我想通过在db.trans中每个时间戳记之前的15分钟内(平均15点,因为可能会丢失一些数据)取平均数据来计算db.temp(db.temp $ WT)的平均值,然后将平均值作为新列附加到db.trans。我还想对db.trans中的每个时间戳使用线性回归来计算这些数据点的梯度(变化斜率),然后将其附加到db.trans。
我有一种使用if循环的方法,但是我的数据集需要一种非常有效的方法。
start <- as.POSIXct("2016-10-02 07:00:21", "%Y-%m-%d %H: %M:%S",tz="")
end <- as.POSIXct("2016-11-06 23:00:00", "%Y-%m-%d %H:%M:%S", tz="")
start1 <- as.POSIXct("2016-10-05 17:30:00", "%Y-%m-%d %H: %M:%S",tz='')
end1 <- as.POSIXct("2016-11-04 20:10:00", "%Y-%m-%d %H:%M:%S", tz="")
Temp.time<- seq(start, end, by =60)
Temp.v <-runif (length(Temp.time),min=20, max=40)
Trans.time <- seq (start1,end1, by=3000)
Trans.v <-sample(state.name,length(Trans.time), replace = T)
Trans.t <- runif(length(Trans.time), min=10, max=26)
db.temp <- data.frame(Time=Temp.time, WT=Temp.v)
db.trans <- data.frame(Time=Trans.time,state=Trans.v, outT=Trans.t)
答案 0 :(得分:0)
如果我正确理解了您的问题,则data.table
包中的滚动联接功能应该在这里有用:
准备数据
library(data.table)
# convert both data frames to data.table
# (this is done by reference, for greater efficiency on large datasets)
setDT(db.temp)
setDT(db.trans)
# to avoid confusion, rename time column in db.temp
setnames(db.temp, "Time", "temp.Time")
# duplicate columns to be used for rolling join
db.temp[, join_time := temp.Time]
db.trans[, join_time := Time]
# set keys for join
setkey(db.temp, join_time)
setkey(db.trans, join_time)
加入数据
# join datasets by matching each db.temp row to the first db.trans row that comes after it
db.trans2 <- db.trans[db.temp, roll = -Inf]
# calculate time difference between the time stamps
db.trans2[, diff.time := as.numeric(difftime(Time, temp.Time, units = "mins"))]
# filter out db.temp rows that had no matching db.trans row (i.e. they took place entirely
# after the last time stamp in the db.trans dataset)
# also filter for rows where the difference between db.temp & db.trans time stamps is
# 15 min or less
db.trans2 <- db.trans2[!is.na(Time) & diff.time <= 15]
计算
# for each db.trans time stamp, calculate the mean / gradient of db.temp WT values
db.trans3 <- db.trans2[, list(mean.WT = mean(WT),
slope.WT = lm(WT ~ temp.Time)$coefficients[[2]]),
by = list(Time, state, outT)]