我有时间序列数据,我希望按小时和计数折叠。
library(tidyverse)
library(lubridate)
library(tibbletime)
df.have <- data.frame(v1=ymd_hms("2018-01-01 01:00:00",
"2018-01-01 01:24:20",
"2018-01-01 02:00:00",
"2018-01-01 03:00:00",
"2018-01-01 03:21:20",
"2018-01-01 03:22:20",
#"2018-01-01 04:00:00",
"2018-01-01 05:00:00",
"2018-01-01 06:00:00",
"2018-01-01 06:10:00",
#"2018-01-01 07:00:00",
#"2018-01-01 08:00:00",
"2018-01-01 09:00:00",
"2018-01-02 01:00:00",
"2018-01-02 01:24:20",
"2018-01-02 02:00:00",
"2018-01-02 03:00:00",
"2018-01-02 03:21:20",
"2018-01-02 03:22:20",
#"2018-01-02 04:00:00",
"2018-01-02 05:00:00",
"2018-01-02 06:00:00",
"2018-01-02 06:10:00",
#"2018-01-02 07:00:00",
#"2018-01-02 08:00:00",
"2018-01-02 09:00:00"),
v2=c(1, 0, 1, 1, NA, 1, 1, 1, 1, 1,
1, 0, 1, 1, NA, 1, 1, 1, 1, 1))
df.have %>%
as_tbl_time(index = v1) %>%
collapse_by("hourly", side="start", clean=TRUE) %>%
group_by(v1) %>%
mutate(sumv2_byhour = sum(v2, na.rm=TRUE),
countv1_byhour = n()) %>%
distinct(v1, .keep_all = TRUE)
# A time tibble: 12 x 4
# Index: v1
# Groups: v1 [12]
v1 v2 sumv2_byhour countv1_byhour
<dttm> <dbl> <dbl> <int>
1 2018-01-01 01:00:00 1.00 1.00 2
2 2018-01-01 02:00:00 2.00 2.00 1
3 2018-01-01 03:00:00 1.00 2.00 3
4 2018-01-01 05:00:00 3.00 3.00 1
5 2018-01-01 06:00:00 1.00 2.00 2
6 2018-01-01 09:00:00 1.00 1.00 1
7 2018-01-02 01:00:00 1.00 1.00 2
8 2018-01-02 02:00:00 2.00 2.00 1
9 2018-01-02 03:00:00 1.00 2.00 3
10 2018-01-02 05:00:00 1.00 1.00 1
11 2018-01-02 06:00:00 4.00 5.00 2
12 2018-01-02 09:00:00 1.00 1.00 1
到目前为止一直很好,但我想对失踪时间零计数。有没有其他方法可以解决这个问题,让我使用dplyr::complete
?
答案 0 :(得分:4)
我们可以在ungroup
步骤之后distinct
然后执行complete
df.have %>%
as_tbl_time(index = v1) %>%
collapse_by("hourly", side="start", clean=TRUE) %>%
group_by(v1) %>%
mutate(sumv2_byhour = sum(v2, na.rm=TRUE),
countv1_byhour = n()) %>%
distinct(v1, .keep_all = TRUE) %>%
ungroup %>%
complete(v1 = seq(v1[1], v1[length(v1)], by = "1 hour"),
fill = list(v2 = 0, sumv2_byhour = 0, countv1_byhour = 0))
# A tibble: 33 x 4
# v1 v2 sumv2_byhour countv1_byhour
# <dttm> <dbl> <dbl> <dbl>
# 1 2018-01-01 01:00:00 1.00 1.00 2.00
# 2 2018-01-01 02:00:00 1.00 1.00 1.00
# 3 2018-01-01 03:00:00 1.00 2.00 3.00
# 4 2018-01-01 04:00:00 0 0 0
# 5 2018-01-01 05:00:00 1.00 1.00 1.00
# 6 2018-01-01 06:00:00 1.00 2.00 2.00
# 7 2018-01-01 07:00:00 0 0 0
# 8 2018-01-01 08:00:00 0 0 0
# 9 2018-01-01 09:00:00 1.00 1.00 1.00
#10 2018-01-01 10:00:00 0 0 0
# ... with 23 more rows
答案 1 :(得分:1)
另一种避免使用c
并使用dplyr::complete()
包的解决方案。我发现这更能描述你想要做什么。
padr