如何执行表的复杂聚合:
df <- structure(list(Operator = c("Ivan", "Eugene", "Ivan", "Ivan",
"Eugene", "Petr"),
begin_time = c("02-01-2014 21:59", "01-01-2014 10:30", "04-01-2014 13:18",
"08-01-2014 17:45", "03-01-2014 00:38", "10-01-2014 12:16"),
end_time = c("04-01-2014 16:01", "03-01-2014 20:20", "05-01-2014 17:14",
"11-01-2014 22:30", "06-01-2014 23:59", "11-01-2014 02:15"),
number_of_tickets = c(2L, 1L, 3L, 4L, 5L, 7L)),
.Names = c("Operator", "begin_time", "end_time", "number_of_tickets"),
class = "data.frame", row.names = c(NA, -6L))
df
Operator begin_time end_time number_of_tickets
1 Ivan 02-01-2014 21:59 04-01-2014 16:01 2
2 Eugene 01-01-2014 10:30 03-01-2014 20:20 1
3 Ivan 04-01-2014 13:18 05-01-2014 17:14 3
4 Ivan 08-01-2014 17:45 11-01-2014 22:30 4
5 Eugene 03-01-2014 00:38 06-01-2014 23:59 5
6 Petr 10-01-2014 12:16 11-01-2014 02:15 7
由运营商在end_time中的begin_time最大值和number_of_tickets中的总和最小值
谢谢。
答案 0 :(得分:1)
假设您的data.frame为dplyr
,这可能会执行您所描述的内容(使用df
)。
require(dplyr)
df %.%
mutate(end_time = as.POSIXct(end_time, format="%d-%m-%Y %H:%M"),
begin_time = as.POSIXct(begin_time, format="%d-%m-%Y %H:%M")) %.%
group_by(Operator) %.%
summarize(min_begin_time = min(begin_time),
max_end_time = max(end_time),
sum_tickets = sum(number_of_tickets))
# Operator min_begin_time max_end_time sum_tickets
#1 Eugene 2014-01-01 10:30:00 2014-01-06 23:59:00 6
#2 Ivan 2014-01-02 21:59:00 2014-01-11 22:30:00 9
#3 Petr 2014-01-10 12:16:00 2014-01-11 02:15:00 7
答案 1 :(得分:1)
使用data.table
library(data.table)
setDT(df)[, list(begin_time = min(as.POSIXct(begin_time, format = "%d-%m-%Y %H:%M")),
end_time = max(as.POSIXct(end_time, format = "%d-%m-%Y %H:%M")),
number_of_tickets = sum(number_of_tickets)), by = Operator]
# Operator begin_time end_time number_of_tickets
# 1: Ivan 2014-01-02 21:59:00 2014-01-11 22:30:00 9
# 2: Eugene 2014-01-01 10:30:00 2014-01-06 23:59:00 6
# 3: Petr 2014-01-10 12:16:00 2014-01-11 02:15:00 7