:)
是否有一种简单的方法可以将特定数据集分组为某些特征的简化数据框?我正在考虑这个算法,但R中是否有任何可用于此的函数?我试图使用dplyr
,但它的效果不是很好......
E.g:
P.S。:我的数据是一个超过1Gb的矩阵,也就是说,我需要一个更自动的过程。
示例数据:
structure(list(Nun = 1:6, Event = c(1L, 1L, 1L, 1L, 2L, 2L),
Time = structure(c(3L, 4L, 5L, 6L, 1L, 2L), .Label = c("11:34",
"11:36", "8:50", "8:52", "8:54", "8:56"), class = "factor"),
User = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("U1",
"U7"), class = "factor")), .Names = c("Nun", "Event", "Time",
"User"), class = "data.frame", row.names = c(NA, -6L))
答案 0 :(得分:1)
您可以使用summarise
包中的dplyr
:
library(dplyr)
your_data_frame %>%
group_by(User, Event) %>%
summarise(Duration = max(Time) - min(Time))
答案 1 :(得分:1)
以下是data.table
方式。
示例数据:
x<-structure(list(Nun = 1:6, Event = c(1L, 1L, 1L, 1L, 2L, 2L),
Time = structure(c(1508514600, 1508514720, 1508514840, 1508514960,
1508524440, 1508524560), class = c("POSIXct", "POSIXt"), tzone = ""),
User = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("U1",
"U7"), class = "factor")), .Names = c("Nun", "Event", "Time",
"User"), row.names = c(NA, -6L), class = "data.frame")
<强>代码:强>
require(data.table)
setDT(x)
x[,list(Duration = max(Time)-min(Time)),by = list(Event,User)]
Event User Duration
1: 1 U1 6 mins
2: 2 U7 2 mins