表的复杂聚合

时间:2014-05-28 11:43:04

标签: r

如何执行表的复杂聚合:

df <- structure(list(Operator = c("Ivan", "Eugene", "Ivan", "Ivan", 
                            "Eugene", "Petr"),
               begin_time = c("02-01-2014 21:59", "01-01-2014 10:30", "04-01-2014 13:18",
                              "08-01-2014 17:45", "03-01-2014 00:38", "10-01-2014 12:16"),
               end_time = c("04-01-2014 16:01", "03-01-2014 20:20", "05-01-2014 17:14",
                            "11-01-2014 22:30", "06-01-2014 23:59", "11-01-2014 02:15"),
               number_of_tickets = c(2L, 1L, 3L, 4L, 5L, 7L)),
          .Names = c("Operator", "begin_time", "end_time", "number_of_tickets"),
          class = "data.frame", row.names = c(NA, -6L))

df
  Operator       begin_time         end_time number_of_tickets
1     Ivan 02-01-2014 21:59 04-01-2014 16:01                 2
2   Eugene 01-01-2014 10:30 03-01-2014 20:20                 1
3     Ivan 04-01-2014 13:18 05-01-2014 17:14                 3
4     Ivan 08-01-2014 17:45 11-01-2014 22:30                 4
5   Eugene 03-01-2014 00:38 06-01-2014 23:59                 5
6     Petr 10-01-2014 12:16 11-01-2014 02:15                 7

由运营商在end_time中的begin_time最大值和number_of_tickets中的总和最小值

谢谢。

2 个答案:

答案 0 :(得分:1)

假设您的data.frame为dplyr,这可能会执行您所描述的内容(使用df)。

require(dplyr) 

df %.% 
  mutate(end_time = as.POSIXct(end_time, format="%d-%m-%Y %H:%M"),
         begin_time = as.POSIXct(begin_time, format="%d-%m-%Y %H:%M")) %.%
  group_by(Operator) %.%
  summarize(min_begin_time = min(begin_time),
            max_end_time = max(end_time),
            sum_tickets = sum(number_of_tickets)) 


#  Operator      min_begin_time        max_end_time sum_tickets
#1   Eugene 2014-01-01 10:30:00 2014-01-06 23:59:00           6
#2     Ivan 2014-01-02 21:59:00 2014-01-11 22:30:00           9
#3     Petr 2014-01-10 12:16:00 2014-01-11 02:15:00           7

答案 1 :(得分:1)

使用data.table

library(data.table)
setDT(df)[, list(begin_time = min(as.POSIXct(begin_time, format = "%d-%m-%Y %H:%M")),
                 end_time = max(as.POSIXct(end_time, format = "%d-%m-%Y %H:%M")),
                 number_of_tickets = sum(number_of_tickets)), by = Operator]

#    Operator          begin_time            end_time number_of_tickets
# 1:     Ivan 2014-01-02 21:59:00 2014-01-11 22:30:00                 9
# 2:   Eugene 2014-01-01 10:30:00 2014-01-06 23:59:00                 6
# 3:     Petr 2014-01-10 12:16:00 2014-01-11 02:15:00                 7