将时间帧分割成日期R.

时间:2015-07-20 20:54:54

标签: r date

我的数据框有3列 - start_time end_time 能源,其中 start_time end_time 是日期时间格式,能量是这两次之间消耗的能量。 ![在此输入图像说明] [1]

我的目标是计算每天所消耗的能量。 start_time end_time 具有相同日期的实例,能源值将分配到该日期。但我需要找到一种方法来对 start_time end_time 具有不同日期的能源值进行分类。例如,像这样的数据框中的实例 -

start_time             end_time               energy
2014-06-09 20:54:10    2014-06-11 05:04:14    1114

应在输出数据框中生成类似这样的实例 -

date        energy
2014-06-09  <energy consumed between 2014-06-09 20:54:10 to 2014-06-09 23:59:59>
2014-06-10  <energy consumed between 2014-06-10 00:00:00 to 2014-06-10 23:59:59>
2014-06-11  <energy consumed between 2014-06-11 00:00:00 to 2014-06-11 05:04:14>

1 个答案:

答案 0 :(得分:0)

我没有测试过多(提供的数据帧有点稀疏......) ,但这似乎没问题。

calcEnergy <- function(startCol, endCol, valCol) {
    require(chron)
    # calculate start and finish times
    chron.fun <- function(x) chron(x[1], x[2], format=c('y-m-d','h:m:s'))
    starts <- unlist(lapply(strsplit(as.character(startCol), " "), chron.fun))
    ends <- unlist(lapply(strsplit(as.character(endCol), " "), chron.fun))
    # need to expand dataframe out to accomodate new rows, so calculate number of 
    # rows per original observation
    nrows <- ceiling(ends) - floor(starts)
    # ..& create expanded dataframe based on this
    df.out <- data.frame(start_time = rep(starts, nrows) + sequence(nrows)-1,
                       end_time = rep.int(ends, nrows) - (rep(nrows,nrows) -sequence(nrows)),
                       valCol = rep.int(valCol, nrows),
                       tDiffs = rep.int(ends - starts, nrows))
    # identify non-original starts and finishes (which are unique)
    startIndex <- !df.out$start_time %in% starts
    endIndex <- !df.out$end_time %in% ends
    # floor or ceiling accordingly
    df.out$start_time[startIndex] <- floor(df.out$start_time[startIndex])
    df.out$end_time[endIndex] <- ceiling(df.out$end_time[endIndex])
    # calculate proportion energy per day
    df.out$energy <- with(df.out, valCol*(end_time-start_time)/tDiffs)
    # reformat cols
    df.out$date <- chron(floor(df.out$start_time), out.format='y-m-d')
    df.out[c("date", "energy")]
}