我有一个类似于以下可重现的样本数据的庞大数据集。
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
我想将此数据汇总到每周级别,以使输出类似于以下内容:
Interval value
1 Week 2, June 2012 *aggregate value for day 10 to day 14 of June 2012*
2 Week 3, June 2012 *aggregate value for day 15 to day 21 of June 2012*
3 Week 4, June 2012 *aggregate value for day 22 to day 28 of June 2012*
4 Week 5, June 2012 *aggregate value for day 29 to day 30 of June 2012*
5 Week 1, July 2012 *aggregate value for day 1 to day 7 of July 2012*
6 Week 2, July 2012 *aggregate value for day 8 to day 10 of July 2012*
如果不编写长代码,如何轻松实现这一目标?
答案 0 :(得分:13)
如果你的意思是“价值”的总和,我认为最简单的方法是将数据转换为xts对象,如GSee建议的那样:
data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)
[,1]
2012-06-10 552
2012-06-17 23629
2012-06-24 23872
2012-07-01 23667
2012-07-08 23552
2012-07-10 10902
我将输出的格式保留为练习: - )
答案 1 :(得分:3)
如果您使用week
中的lubridate
,则只需五周时间即可转到by
。假设dat
是您的数据,
> library(lubridate)
> do.call(rbind, by(dat$value, week(dat$Interval), summary))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 24 552 4146 4188 3759 4529 4850
# 25 490 2498 4256 3396 4438 5156
# 26 564 2578 4206 3355 4346 4866
# 27 698 993 4868 3366 5122 5770
# 28 671 1086 3200 3200 5314 5726
这显示了一年中第24周到第28周的摘要。同样,我们可以使用aggregate
和
> aggregate(value~week(Interval), data = dat, mean)
# week(Interval) value
# 1 24 3758.667
# 2 25 3396.286
# 3 26 3355.000
# 4 27 3366.429
# 5 28 3199.500
答案 2 :(得分:3)
如果您使用的是数据框,则可以使用tidyquant
包轻松完成此操作。使用tq_transmute
函数,该函数应用突变并返回新的数据框。选择&#34;值&#34;列并应用xts函数apply.weekly
。附加参数FUN = sum
将按周获得汇总。
library(tidyquant)
df
#> # A tibble: 31 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-11 4850
#> 3 2012-06-12 4642
#> 4 2012-06-13 4132
#> 5 2012-06-14 4190
#> 6 2012-06-15 4186
#> 7 2012-06-16 1139
#> 8 2012-06-17 490
#> 9 2012-06-18 5156
#> 10 2012-06-19 4430
#> # ... with 21 more rows
df %>%
tq_transmute(select = value,
mutate_fun = apply.weekly,
FUN = sum)
#> # A tibble: 6 x 2
#> Interval value
#> <date> <int>
#> 1 2012-06-10 552
#> 2 2012-06-17 23629
#> 3 2012-06-24 23872
#> 4 2012-07-01 23667
#> 5 2012-07-08 23552
#> 6 2012-07-10 10902
答案 3 :(得分:1)
我刚遇到这个老问题,因为它被用作欺骗目标。
不幸的是,所有支持的答案(the one by konvas和a now deleted one除外)都提供了按年汇总数据的解决方案,而OP已请求汇总< em>一个月中的某周。
一年中一周和一周的定义不明确,如here,here和{{3}所述}。
然而,OP表示他希望将每个月的第1天至第7天计为当月的第1周,将第8天至第14天计为该月的第2周等。请注意,第5周是一个存根。大多数月份仅包括2或3天(2月份除外)。
在准备好基础之后,这是一种用于此类聚合的data.table
解决方案:
library(data.table)
DT[, .(value = sum(value)),
by = .(Interval = sprintf("Week %i, %s",
(mday(Interval) - 1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Interval value 1: Week 2, Jun 2012 18366 2: Week 3, Jun 2012 24104 3: Week 4, Jun 2012 23348 4: Week 5, Jun 2012 5204 5: Week 1, Jul 2012 23579 6: Week 2, Jul 2012 11573
我们可以通过
验证我们选择了正确的间隔DT[, .(value = sum(value),
date_range = toString(range(Interval))),
by = .(Week = sprintf("Week %i, %s",
(mday(Interval) -1L) %/% 7L + 1L,
format(Interval, "%b %Y")))]
Week value date_range 1: Week 2, Jun 2012 18366 2012-06-10, 2012-06-14 2: Week 3, Jun 2012 24104 2012-06-15, 2012-06-21 3: Week 4, Jun 2012 23348 2012-06-22, 2012-06-28 4: Week 5, Jun 2012 5204 2012-06-29, 2012-06-30 5: Week 1, Jul 2012 23579 2012-07-01, 2012-07-07 6: Week 2, Jul 2012 11573 2012-07-08, 2012-07-10
符合OP的规范。
library(data.table)
DT <- fread(
"rn Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176", drop = 1L)
DT[, Interval := as.Date(Interval)]
答案 4 :(得分:0)
当你说&#34;聚合&#34;价值观,你的意思是他们的总和?我们假设您的数据框为d
,假设d$Interval
属于Date
类,您可以尝试
# if d$Interval is not of class Date d$Interval <- as.Date(d$Interval)
formatdate <- function(date)
paste0("Week ", (as.numeric(format(date, "%d")) - 1) + 1,
", ", format(date, "%b %Y"))
# change "sum" to your required function
aggregate(d$value, by = list(formatdate(d$Interval)), sum)
# Group.1 x
# 1 Week 1, Jul 2012 3725.667
# 2 Week 2, Jul 2012 3199.500
# 3 Week 2, Jun 2012 3544.000
# 4 Week 3, Jun 2012 3434.000
# 5 Week 4, Jun 2012 3333.143
# 6 Week 5, Jun 2012 3158.667