如何将这些数据汇总15分钟(时钟时间)累计秒以及每个 loc 的唯一ID数量?
> dput(df)
structure(list(id = c(131, 146, 160, 146, 160, 146, 160, 137,
157, 144, 124, 144, 119, 119, 242, 242, 235, 235, 145, 262, 258,
160, 145, 135, 148, 148, 143), loc = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
time = structure(c(1425197400, 1425197400, 1425197400, 1425197460,
1425197460, 1425197520, 1425197520, 1425197940, 1425198180,
1425198180, 1425198180, 1425198240, 1425198240, 1425198300,
1425198300, 1425198360, 1425198480, 1425198540, 1425198840,
1425198900, 1425346560, 1425346560, 1425347280, 1425347460,
1425347520, 1425347580, 1425347580), class = c("POSIXct",
"POSIXt")), secs = c(35, 60, 60, 60, 60, 19, 24, 0, 0, 60,
0, 46, 60, 28, 60, 48, 60, 18, 6, 0, 0, 43, 0, 37, 60, 27,
14)), .Names = c("id", "loc", "time", "secs"), row.names = c(NA,
27L), class = "data.frame")
此示例的输出应如下所示:
> dput(df.out)
structure(list(unique.id = c(3, 7, 2, 2, 4), loc = c("A", "A",
"A", "B", "B"), time = structure(c(1425172501, 1425173400, 1425174300,
1425321900, 1425322800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
secs = c(318, 380, 6, 43, 138)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -5L), .Names = c("unique.id",
"loc", "time", "secs"))
我已成功使用包xts来计算秒数:
## disregarding the loc grouping:
df.test <- select(df, time, secs)
df.test <- na.omit(df.test) ##xts with period.sum does not like NA
df.test <- as.xts(df.test, order.by = df.test$time)
df.test <- period.sum(df.test$secs, endpoints(df.test , "mins", k=15))
df.test <- align.time(df.test , 15*60)
但是我无法做同样的事情来计算唯一ID 。顺便说一句,如果有人有一个更优雅的解决方案,我欢迎你提供意见(准备期间指标然后只需将所有内容提供给dplyr::group_by()::summarise()
)
由于
答案 0 :(得分:2)
这是使用dplyr的一种解决方案。将时间转换为15分钟,然后进行group_y / summary。
df$time<- as.POSIXct(ceiling(as.double(df$time) / (15*60)) * (15*60),
origin = '1970-01-01')
df %>%
group_by(time, loc) %>%
summarise(unique.id = n_distinct(id), secs = sum(secs)) %>%
select(unique.id, loc, time, secs)
输出是:
Source: local data frame [5 x 4]
Groups: time [5]
unique.id loc time secs
<int> <fctr> <dttm> <dbl>
1 3 A 2015-03-01 03:15:00 318
2 7 A 2015-03-01 03:30:00 380
3 2 A 2015-03-01 03:45:00 6
4 2 B 2015-03-02 20:45:00 43
5 4 B 2015-03-02 21:00:00 138