我想要的是对属于同一时间范围内的值的各个部分进行求和。在上一个值的 6小时之后出现的任何值,我想要在一个新的段中。我还想计算每个细分中的小时数以及每个细分的最大值和平均值。
以下是示例数据:
Date <- c("1954-10-07", "1954-10-07", "1954-10-07", "1954-10-07", "1954-10-07", "1954-10-07", "1954-10-11", "1954-10-11", "1954-10-11", "1954-10-12", "1954-10-13")
Time <- c("0:00", "1:00", "4:00", "13:00", "14:00", "15:00", "9:00","10:00", "11:00", "23:00", "0:00")
DateTime <- paste(Date, Time)
Value <- c(0.1, 0.2, 0.1, 0.02, 0.2, 1.1, 0.2, 0.3, 0.4, 0.1, 0.05)
df <- data.frame(Date, Time, DateTime, Value)
df
Date Time DateTime Value
1954-10-07 0:00 1954-10-07 0:00 0.10
1954-10-07 1:00 1954-10-07 1:00 0.20
1954-10-07 4:00 1954-10-07 4:00 0.10
1954-10-07 13:00 1954-10-07 13:00 0.02
1954-10-07 14:00 1954-10-07 14:00 0.20
1954-10-07 15:00 1954-10-07 15:00 1.10
1954-10-11 9:00 1954-10-11 9:00 0.20
1954-10-11 10:00 1954-10-11 10:00 0.30
1954-10-11 11:00 1954-10-11 11:00 0.40
1954-10-12 23:00 1954-10-12 23:00 0.10
1954-10-13 0:00 1954-10-13 0:00 0.05
期望的输出:
IntervalStart IntervalEnd ValueSum ValueMax ValueMedian HoursinSegment
1954-10-07 0:00 1954-10-07 4:00 0.4 0.2 0.1 4
1954-10-07 13:00 1954-10-07 14:00 1.32 1.10 0.2 3
1954-10-11 9:00 1954-10-11 10:00 0.5 0.30 0.25 1
1954-10-12 23:00 1954-10-13 0:00 0.15 0.1 0.75 1
我认为我在时间戳中的诀窍是,因为某些值会在第二天出现,但仍然在之前值的6小时内。谢谢你的帮助!
答案 0 :(得分:2)
我认为这可以满足您的需求:
library(data.table)
setDT(df)[,DateTime := as.POSIXct(sprintf("%s:00", DateTime))]
df[, Grp := cumsum(c(0, difftime(DateTime[-1], head(DateTime, -1), units = "h")) > 6)]
df[,.(Start = min(DateTime),
End = max(DateTime),
Min = min(Value),
Max = max(Value),
Median = median(Value),
Span = difftime(max(DateTime), min(DateTime), "h")),
by = "Grp"]
# Grp Start End Min Max Median Span
# 1: 0 1954-10-07 00:00:00 1954-10-07 04:00:00 0.10 0.2 0.100 4 hours
# 2: 1 1954-10-07 13:00:00 1954-10-07 15:00:00 0.02 1.1 0.200 2 hours
# 3: 2 1954-10-11 09:00:00 1954-10-11 11:00:00 0.20 0.4 0.300 2 hours
# 4: 3 1954-10-12 23:00:00 1954-10-13 00:00:00 0.05 0.1 0.075 1 hours
setDT(df)[,DateTime := as.POSIXct(...
将df
转换为data.table
,并将DateTime
列转换为POSIXct
df[, Grp := cumsum(c(0, difftime(...
根据您上述情况创建分组ID,即当DateTime[i] - DateTime[i - 1]
大于6小时时,新分组开始df[,.(Start = min(DateTime), ...
计算每个Grp