我的数据框有3列 - start_time , end_time ,能源,其中 start_time 和 end_time 是日期时间格式,能量是这两次之间消耗的能量。 ![在此输入图像说明] [1]
我的目标是计算每天所消耗的能量。 start_time 和 end_time 具有相同日期的实例,能源值将分配到该日期。但我需要找到一种方法来对 start_time 和 end_time 具有不同日期的能源值进行分类。例如,像这样的数据框中的实例 -
start_time end_time energy
2014-06-09 20:54:10 2014-06-11 05:04:14 1114
应在输出数据框中生成类似这样的实例 -
date energy
2014-06-09 <energy consumed between 2014-06-09 20:54:10 to 2014-06-09 23:59:59>
2014-06-10 <energy consumed between 2014-06-10 00:00:00 to 2014-06-10 23:59:59>
2014-06-11 <energy consumed between 2014-06-11 00:00:00 to 2014-06-11 05:04:14>
答案 0 :(得分:0)
我没有测试过多(提供的数据帧有点稀疏......) ,但这似乎没问题。
calcEnergy <- function(startCol, endCol, valCol) {
require(chron)
# calculate start and finish times
chron.fun <- function(x) chron(x[1], x[2], format=c('y-m-d','h:m:s'))
starts <- unlist(lapply(strsplit(as.character(startCol), " "), chron.fun))
ends <- unlist(lapply(strsplit(as.character(endCol), " "), chron.fun))
# need to expand dataframe out to accomodate new rows, so calculate number of
# rows per original observation
nrows <- ceiling(ends) - floor(starts)
# ..& create expanded dataframe based on this
df.out <- data.frame(start_time = rep(starts, nrows) + sequence(nrows)-1,
end_time = rep.int(ends, nrows) - (rep(nrows,nrows) -sequence(nrows)),
valCol = rep.int(valCol, nrows),
tDiffs = rep.int(ends - starts, nrows))
# identify non-original starts and finishes (which are unique)
startIndex <- !df.out$start_time %in% starts
endIndex <- !df.out$end_time %in% ends
# floor or ceiling accordingly
df.out$start_time[startIndex] <- floor(df.out$start_time[startIndex])
df.out$end_time[endIndex] <- ceiling(df.out$end_time[endIndex])
# calculate proportion energy per day
df.out$energy <- with(df.out, valCol*(end_time-start_time)/tDiffs)
# reformat cols
df.out$date <- chron(floor(df.out$start_time), out.format='y-m-d')
df.out[c("date", "energy")]
}