我正在尝试将数据汇总为日高峰和非高峰。某些时间不在高峰期。
Date Time Value
2019-09-01 00:00:00 0.34
2019-09-01 00:30:00 0.34
2019-09-01 01:00:00 0.34
2019-09-01 01:30:00 0.38
2019-09-01 02:00:00 0.34
2019-09-01 02:30:00 0.34
2019-09-01 03:00:00 0.34
2019-09-01 03:30:00 0.34
2019-09-01 04:00:00 0.34
2019-09-01 04:30:00 0.34
2019-09-01 05:00:00 0.34
2019-09-01 05:30:00 0.34
2019-09-01 06:00:00 0.41
2019-09-01 06:30:00 0.53
2019-09-01 07:00:00 0.56
2019-09-01 07:30:00 0.56
2019-09-01 08:00:00 0.53
2019-09-01 08:30:00 0.66
2019-09-01 09:00:00 1.03
2019-09-01 09:30:00 1.03
我已使用此方法将峰值True / False添加到数据框中
Data$Peak <- Data$Time > "07:00:00" & Data$Time <= "23:00:00" & !grepl("S.+", weekdays(Data$Date))
这几乎可以满足我的要求。所有值都在那里,但列表很长。
Day_Summary <- aggregate(Data$Value, by=list(Data$Date, Data$Peak), FUN=sum)
我也尝试过summarize
和mutate
,但没有得到想要的东西。任何帮助都会很棒。
我希望数据像这样显示。
Date, Peak, OffPeak
2019-09-01, 156, 36
2019-09-02, 145, 56
2019-09-02, 180, 0
答案 0 :(得分:0)
使用data.table
,并假设星期一至星期五07-23是高峰,而一周的其余时间是非高峰...
样本数据
library(data.table)
#create sample data
dt <- fread("Date Time Value
2019-09-02 00:00:00 0.34
2019-09-02 00:30:00 0.34
2019-09-02 01:00:00 0.34
2019-09-02 01:30:00 0.38
2019-09-02 02:00:00 0.34
2019-09-02 02:30:00 0.34
2019-09-02 03:00:00 0.34
2019-09-02 03:30:00 0.34
2019-09-02 04:00:00 0.34
2019-09-02 04:30:00 0.34
2019-09-02 05:00:00 0.34
2019-09-02 05:30:00 0.34
2019-09-02 06:00:00 0.41
2019-09-02 06:30:00 0.53
2019-09-02 07:00:00 0.56
2019-09-02 07:30:00 0.56
2019-09-02 08:00:00 0.53
2019-09-02 08:30:00 0.66
2019-09-02 09:00:00 1.03
2019-09-02 09:30:00 1.03")
#set date as iDate te and time as iTime
dt[, `:=`( Date = as.IDate( Date ),
Time = as.ITime( Time ) )]
代码
#NB, in data.table::wday, Sunday = 1 !!
#create column with peak/off-peak
#assuming peak = Mon-Fri 7-23
#initialise period column, all = "off-peak"
dt[, period := "off-peak" ]
#update period-column peak-period entries to "peak"
dt[ !data.table::wday( Date ) %in% c(1,7) &
Time %between% c( as.ITime( "07:00:00" ), as.ITime( "23:00:00" ) ),
period := "peak"]
#summarise
ans <- dt[, .( sum = sum( Value ) ), by = .( Date, period ) ]
#cast to wide
dcast( ans, Date ~ period, value.var = "sum", fill = 0 )
输出
# Date off-peak peak
# 1: 2019-09-02 5.06 4.37
答案 1 :(得分:0)
您可以创建详细的时间列Time2
,即日期{em>和时间为"POSIXct"
格式。我在下面做了一些示例数据DF
。
DF$Time2 <- as.POSIXct(sapply(1:nrow(DF), function(x) Reduce(paste, DF[x, c("Date", "Time")])))
在Time2
中,您可以应用format()
使用this solution创建一个小时-分-秒-秒列hms
。这里的技巧是hms
仅显示没有日期的时间,这有助于找到Peak
。
DF$hms <- format(as.POSIXct(DF$Time2), "%H:%M:%S")
DF$Peak <- with(DF, hms > "07:00:00" & hms <= "23:00:00" & !grepl("S.+", weekdays(Time2)))
最后,我们执行两个aggregate()
:在第一个中,我们执行与以前类似的“技巧”,但是使用as.Date
来节省时间,第二个重新排列结果。用setNames()
设置好听的名字。 (如this answer所述,我们还应该在其周围包裹一个do.call(data.frame, .)
以得到一个干净的结构。)
a1 <- with(DF, aggregate(Value, list(Peak=Peak, Date=as.Date(Time2)), sum))
res <- setNames(do.call(data.frame,
aggregate(x ~ Date, a1[-1, ], I)
),
c("Date", "OffPeak", "Peak"))[c(1, 3, 2)]
res
# Date Peak OffPeak
# 1 2019-01-01 41.38 8.51
# 2 2019-01-02 49.12 11.41
# 3 2019-01-03 37.38 6.46
DF <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2019-01-01",
"2019-01-02", "2019-01-03"), class = "factor"), Time = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
47L, 48L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L,
26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L,
39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L), .Label = c("00:00:00",
"00:30:00", "01:00:00", "01:30:00", "02:00:00", "02:30:00", "03:00:00",
"03:30:00", "04:00:00", "04:30:00", "05:00:00", "05:30:00", "06:00:00",
"06:30:00", "07:00:00", "07:30:00", "08:00:00", "08:30:00", "09:00:00",
"09:30:00", "10:00:00", "10:30:00", "11:00:00", "11:30:00", "12:00:00",
"12:30:00", "13:00:00", "13:30:00", "14:00:00", "14:30:00", "15:00:00",
"15:30:00", "16:00:00", "16:30:00", "17:00:00", "17:30:00", "18:00:00",
"18:30:00", "19:00:00", "19:30:00", "20:00:00", "20:30:00", "21:00:00",
"21:30:00", "22:00:00", "22:30:00", "23:00:00", "23:30:00"), class = "factor"),
Value = c(0.03, 0.04, 0.06, 0.1, 0.2, 0.22, 0.23, 0.28, 0.28,
0.31, 0.31, 0.35, 0.35, 0.37, 0.39, 0.39, 0.41, 0.44, 0.47,
0.48, 0.5, 0.57, 0.62, 0.66, 0.66, 0.67, 0.71, 0.72, 0.74,
0.78, 1.19, 1.2, 1.21, 1.25, 1.25, 1.29, 1.31, 1.34, 1.42,
1.46, 1.52, 1.76, 1.9, 2.41, 3.02, 4.17, 4.86, 5, 0.03, 0.03,
0.07, 0.13, 0.15, 0.16, 0.16, 0.18, 0.22, 0.22, 0.24, 0.25,
0.29, 0.33, 0.4, 0.42, 0.44, 0.45, 0.47, 0.47, 0.49, 0.5,
0.51, 0.55, 0.63, 0.64, 0.66, 0.67, 0.91, 1.03, 1.06, 1.12,
1.12, 1.13, 1.2, 1.27, 1.34, 1.45, 1.54, 1.57, 1.65, 1.75,
2.36, 2.51, 5.71, 6.65, 6.85, 8.46, 0.07, 0.08, 0.09, 0.09,
0.09, 0.1, 0.12, 0.17, 0.18, 0.22, 0.3, 0.36, 0.36, 0.38,
0.44, 0.46, 0.46, 0.48, 0.49, 0.49, 0.54, 0.55, 0.56, 0.57,
0.59, 0.65, 0.73, 0.77, 0.79, 0.8, 0.84, 0.99, 1.04, 1.11,
1.27, 1.34, 1.35, 1.42, 1.42, 1.82, 1.88, 1.89, 1.94, 1.96,
2.24, 2.85, 3.09, 3.56)), row.names = c(NA, -144L), class = "data.frame")