如何根据另一列中的TRUE / FALSE获取一列的总和

时间:2019-10-03 08:16:13

标签: r

我正在尝试将数据汇总为日高峰和非高峰。某些时间不在高峰期。

Date        Time        Value
2019-09-01  00:00:00    0.34
2019-09-01  00:30:00    0.34
2019-09-01  01:00:00    0.34
2019-09-01  01:30:00    0.38
2019-09-01  02:00:00    0.34
2019-09-01  02:30:00    0.34
2019-09-01  03:00:00    0.34
2019-09-01  03:30:00    0.34
2019-09-01  04:00:00    0.34
2019-09-01  04:30:00    0.34
2019-09-01  05:00:00    0.34
2019-09-01  05:30:00    0.34
2019-09-01  06:00:00    0.41
2019-09-01  06:30:00    0.53
2019-09-01  07:00:00    0.56
2019-09-01  07:30:00    0.56
2019-09-01  08:00:00    0.53
2019-09-01  08:30:00    0.66
2019-09-01  09:00:00    1.03
2019-09-01  09:30:00    1.03

我已使用此方法将峰值True / False添加到数据框中

Data$Peak <- Data$Time > "07:00:00" & Data$Time <= "23:00:00" & !grepl("S.+", weekdays(Data$Date))

这几乎可以满足我的要求。所有值都在那里,但列表很长。

Day_Summary <- aggregate(Data$Value, by=list(Data$Date, Data$Peak), FUN=sum)

我也尝试过summarizemutate,但没有得到想要的东西。任何帮助都会很棒。

我希望数据像这样显示。

Date, Peak, OffPeak
2019-09-01, 156, 36
2019-09-02, 145, 56
2019-09-02, 180, 0

2 个答案:

答案 0 :(得分:0)

使用data.table,并假设星期一至星期五07-23是高峰,而一周的其余时间是非高峰...

样本数据

library(data.table)

#create sample data
dt <- fread("Date   Time    Value
2019-09-02  00:00:00    0.34
2019-09-02  00:30:00    0.34
2019-09-02  01:00:00    0.34
2019-09-02  01:30:00    0.38
2019-09-02  02:00:00    0.34
2019-09-02  02:30:00    0.34
2019-09-02  03:00:00    0.34
2019-09-02  03:30:00    0.34
2019-09-02  04:00:00    0.34
2019-09-02  04:30:00    0.34
2019-09-02  05:00:00    0.34
2019-09-02  05:30:00    0.34
2019-09-02  06:00:00    0.41
2019-09-02  06:30:00    0.53
2019-09-02  07:00:00    0.56
2019-09-02  07:30:00    0.56
2019-09-02  08:00:00    0.53
2019-09-02  08:30:00    0.66
2019-09-02  09:00:00    1.03
2019-09-02  09:30:00    1.03")
#set date as iDate te and time as iTime
dt[, `:=`( Date = as.IDate( Date ),
           Time = as.ITime( Time ) )]

代码

#NB, in data.table::wday, Sunday = 1 !! 
#create column with peak/off-peak
#assuming peak = Mon-Fri 7-23
#initialise period column, all = "off-peak"
dt[, period := "off-peak" ]
#update period-column peak-period entries to "peak"
dt[ !data.table::wday( Date ) %in% c(1,7) & 
      Time %between% c( as.ITime( "07:00:00" ), as.ITime( "23:00:00" ) ),
    period := "peak"]
#summarise
ans <- dt[, .( sum = sum( Value ) ), by = .( Date, period ) ]
#cast to wide
dcast( ans, Date ~ period, value.var = "sum", fill = 0 )

输出

#          Date off-peak peak
# 1: 2019-09-02     5.06 4.37

答案 1 :(得分:0)

您可以创建详细的时间列Time2,即日期{em>和时间为"POSIXct"格式。我在下面做了一些示例数据DF

DF$Time2 <- as.POSIXct(sapply(1:nrow(DF), function(x) Reduce(paste, DF[x, c("Date", "Time")])))

Time2中,您可以应用format()使用this solution创建一个小时-分-秒-秒列hms。这里的技巧是hms仅显示没有日期的时间,这有助于找到Peak

DF$hms <- format(as.POSIXct(DF$Time2), "%H:%M:%S")
DF$Peak <- with(DF, hms > "07:00:00" & hms <= "23:00:00" & !grepl("S.+", weekdays(Time2)))

最后,我们执行两个aggregate():在第一个中,我们执行与以前类似的“技巧”,但是使用as.Date来节省时间,第二个重新排列结果。用setNames()设置好听的名字。 (如this answer所述,我们还应该在其周围包裹一个do.call(data.frame, .)以得到一个干净的结构。)

a1 <- with(DF, aggregate(Value, list(Peak=Peak, Date=as.Date(Time2)), sum))
res <- setNames(do.call(data.frame, 
                        aggregate(x ~ Date, a1[-1, ], I)
                        ),
                c("Date", "OffPeak", "Peak"))[c(1, 3, 2)]

结果

res
#         Date  Peak OffPeak
# 1 2019-01-01 41.38    8.51
# 2 2019-01-02 49.12   11.41
# 3 2019-01-03 37.38    6.46

数据

DF <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2019-01-01", 
"2019-01-02", "2019-01-03"), class = "factor"), Time = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 
42L, 43L, 44L, 45L, 46L, 47L, 48L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 
47L, 48L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 
26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 
39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L), .Label = c("00:00:00", 
"00:30:00", "01:00:00", "01:30:00", "02:00:00", "02:30:00", "03:00:00", 
"03:30:00", "04:00:00", "04:30:00", "05:00:00", "05:30:00", "06:00:00", 
"06:30:00", "07:00:00", "07:30:00", "08:00:00", "08:30:00", "09:00:00", 
"09:30:00", "10:00:00", "10:30:00", "11:00:00", "11:30:00", "12:00:00", 
"12:30:00", "13:00:00", "13:30:00", "14:00:00", "14:30:00", "15:00:00", 
"15:30:00", "16:00:00", "16:30:00", "17:00:00", "17:30:00", "18:00:00", 
"18:30:00", "19:00:00", "19:30:00", "20:00:00", "20:30:00", "21:00:00", 
"21:30:00", "22:00:00", "22:30:00", "23:00:00", "23:30:00"), class = "factor"), 
    Value = c(0.03, 0.04, 0.06, 0.1, 0.2, 0.22, 0.23, 0.28, 0.28, 
    0.31, 0.31, 0.35, 0.35, 0.37, 0.39, 0.39, 0.41, 0.44, 0.47, 
    0.48, 0.5, 0.57, 0.62, 0.66, 0.66, 0.67, 0.71, 0.72, 0.74, 
    0.78, 1.19, 1.2, 1.21, 1.25, 1.25, 1.29, 1.31, 1.34, 1.42, 
    1.46, 1.52, 1.76, 1.9, 2.41, 3.02, 4.17, 4.86, 5, 0.03, 0.03, 
    0.07, 0.13, 0.15, 0.16, 0.16, 0.18, 0.22, 0.22, 0.24, 0.25, 
    0.29, 0.33, 0.4, 0.42, 0.44, 0.45, 0.47, 0.47, 0.49, 0.5, 
    0.51, 0.55, 0.63, 0.64, 0.66, 0.67, 0.91, 1.03, 1.06, 1.12, 
    1.12, 1.13, 1.2, 1.27, 1.34, 1.45, 1.54, 1.57, 1.65, 1.75, 
    2.36, 2.51, 5.71, 6.65, 6.85, 8.46, 0.07, 0.08, 0.09, 0.09, 
    0.09, 0.1, 0.12, 0.17, 0.18, 0.22, 0.3, 0.36, 0.36, 0.38, 
    0.44, 0.46, 0.46, 0.48, 0.49, 0.49, 0.54, 0.55, 0.56, 0.57, 
    0.59, 0.65, 0.73, 0.77, 0.79, 0.8, 0.84, 0.99, 1.04, 1.11, 
    1.27, 1.34, 1.35, 1.42, 1.42, 1.82, 1.88, 1.89, 1.94, 1.96, 
    2.24, 2.85, 3.09, 3.56)), row.names = c(NA, -144L), class = "data.frame")