Question

我有一个时间序列数据集（ts），每天都有卖出。

ts
## A tibble: 40 x 2
#        dates sells
#       <date> <int>
# 1 2014-09-01    32
# 2 2014-09-02     8
# 3 2014-09-03    39
# 4 2014-09-04    38
# 5 2014-09-05     1
# 6 2014-09-06    28
# 7 2014-09-07    33
# 8 2014-09-08    21
# 9 2014-09-09    29
#10 2014-09-10    33
## ... with 30 more rows

我希望定期获得销售总额，例如四天。

在这种情况下，第一个8天的输出将是：

## A tibble: 2 x 1
#  value
#  <dbl>
#1   117
#2    83

我知道在python中使用pandas resample很容易，但是我无法在R中完成。

我的数据：

ts <- structure(list(dates = structure(c(16314, 16315, 16316, 16317, 
16318, 16319, 16320, 16321, 16322, 16323, 16324, 16325, 16326, 
16327, 16328, 16329, 16330, 16331, 16332, 16333, 16334, 16335, 
16336, 16337, 16338, 16339, 16340, 16341, 16342, 16343, 16344, 
16345, 16346, 16347, 16348, 16349, 16350, 16351, 16352, 16353
), class = "Date"), sells = c(32L, 8L, 39L, 38L, 1L, 28L, 33L, 
21L, 29L, 33L, 13L, 32L, 10L, 15L, 19L, 3L, 17L, 35L, 29L, 10L, 
27L, 14L, 30L, 11L, 24L, 31L, 10L, 27L, 32L, 23L, 25L, 2L, 22L, 
4L, 18L, 22L, 15L, 16L, 23L, 3L)), .Names = c("dates", "sells"
), row.names = c(NA, -40L), class = c("tbl_df", "tbl", "data.frame"
))

谢谢。

Answer 1

在R中，一个选项是使用cut.Date中的group_by创建4天的时间间隔，然后获得＆＃39;销售的sum＆＃ 39;

library(dplyr)
out <- ts %>%
         group_by(interval = cut(dates, breaks = '4 day')) %>% 
         summarise(value = sum(sells))
head(out, 2)
# A tibble: 2 x 2
#   interval  value
#  <fctr>     <int>
#1 2014-09-01   117
#2 2014-09-05    83

Answer 2

如果我们切换到时间序列表示，它会使它变得特别简单：

library(zoo)

z <- read.zoo(ts)
z4 <- rollapplyr(z, 4, by = 4, sum)

给出以下每个4天间隔的结束日期索引的时间序列：

> z4
2014-09-04 2014-09-08 2014-09-12 2014-09-16 2014-09-20 2014-09-24 2014-09-28 
       117         83        107         47         91         82         92 
2014-10-02 2014-10-06 2014-10-10 
        82         66         57

（如果您想将输出转换为数据框，则fortify.zoo(z4)或者您只想将和的序列转换为普通向量coredata(z4)。）

library(ggplot2)
autoplot(z4)

R中的常规时间间隔

2 个答案: