假设我有几年的数据,如下所示
# load date package and set random seed
library(lubridate)
set.seed(42)
# create data.frame of dates and income
date <- seq(dmy("26-12-2010"), dmy("15-01-2011"), by = "days")
df <- data.frame(date = date,
wday = wday(date),
wday.name = wday(date, label = TRUE, abbr = TRUE),
income = round(runif(21, 0, 100)),
week = format(date, format="%Y-%U"),
stringsAsFactors = FALSE)
# date wday wday.name income week
# 1 2010-12-26 1 Sun 91 2010-52
# 2 2010-12-27 2 Mon 94 2010-52
# 3 2010-12-28 3 Tues 29 2010-52
# 4 2010-12-29 4 Wed 83 2010-52
# 5 2010-12-30 5 Thurs 64 2010-52
# 6 2010-12-31 6 Fri 52 2010-52
# 7 2011-01-01 7 Sat 74 2011-00
# 8 2011-01-02 1 Sun 13 2011-01
# 9 2011-01-03 2 Mon 66 2011-01
# 10 2011-01-04 3 Tues 71 2011-01
# 11 2011-01-05 4 Wed 46 2011-01
# 12 2011-01-06 5 Thurs 72 2011-01
# 13 2011-01-07 6 Fri 93 2011-01
# 14 2011-01-08 7 Sat 26 2011-01
# 15 2011-01-09 1 Sun 46 2011-02
# 16 2011-01-10 2 Mon 94 2011-02
# 17 2011-01-11 3 Tues 98 2011-02
# 18 2011-01-12 4 Wed 12 2011-02
# 19 2011-01-13 5 Thurs 47 2011-02
# 20 2011-01-14 6 Fri 56 2011-02
# 21 2011-01-15 7 Sat 90 2011-02
我想将每周(周日至周六)的“收入”加起来。目前我做以下事情:
Weekending 2011-01-01 = sum(df$income[1:7]) = 487
Weekending 2011-01-08 = sum(df$income[8:14]) = 387
Weekending 2011-01-15 = sum(df$income[15:21]) = 443
但是我想要一种更健全的方法,它会自动按周计算。我无法弄清楚如何将数据自动分组为几周。任何帮助将不胜感激。
答案 0 :(得分:8)
首先使用format
将您的日期转换为周数,然后plyr::ddply()
计算摘要:
library(plyr)
df$week <- format(df$date, format="%Y-%U")
ddply(df, .(week), summarize, income=sum(income))
week income
1 2011-52 413
2 2012-01 435
3 2012-02 379
有关format.date
的详细信息,请参阅?strptime
,特别是将%U
定义为周数的位。
修改强>
鉴于修改后的数据和要求,一种方法是将日期除以7,得到一个表示星期的数字。 (或者更准确地说,除以一周内的秒数来获得自纪元以来的周数,默认情况下是1970-01-01。
在代码中:
df$week <- as.Date("1970-01-01")+7*trunc(as.numeric(df$date)/(3600*24*7))
library(plyr)
ddply(df, .(week), summarize, income=sum(income))
week income
1 2010-12-23 298
2 2010-12-30 392
3 2011-01-06 294
4 2011-01-13 152
我没有检查星期日的星期界限。您必须检查此项,并在公式中插入适当的偏移量。
答案 1 :(得分:8)
现在使用dplyr这很简单。另外,我建议使用cut(breaks = "week")
而不是format()
将日期缩短为几周。
library(dplyr)
df %>% group_by(week = cut(date, "week")) %>% mutate(weekly_income = sum(income))
答案 2 :(得分:1)
我用谷歌搜索“团体周日到周R”并遇到this SO question。你提到你有多年,所以我认为我们需要跟上周数和年份,所以我修改了那里的答案format(date, format = "%U%y")
在使用中它看起来像这样:
library(plyr) #for aggregating
df <- transform(df, weeknum = format(date, format = "%y%U"))
ddply(df, "weeknum", summarize, suminc = sum(income))
#----
weeknum suminc
1 1152 413
2 1201 435
3 1202 379
有关所有格式缩写,请参阅?strptime
。
答案 3 :(得分:1)
从rollapply
包中尝试zoo
:
rollapply(df$income, width=7, FUN = sum, by = 7)
# [1] 487 387 443
或者,使用period.sum
包中的xts
:
period.sum(xts(df$income, order.by=df$date), which(df$wday %in% 7))
# [,1]
# 2011-01-01 487
# 2011-01-08 387
# 2011-01-15 443
或者,以您想要的格式获得输出:
data.frame(income = period.sum(xts(df$income, order.by=df$date),
which(df$wday %in% 7)),
week = df$week[which(df$wday %in% 7)])
# income week
# 2011-01-01 487 2011-00
# 2011-01-08 387 2011-01
# 2011-01-15 443 2011-02
请注意,第一周显示为2011-00
,因为这是在数据中输入的方式。您也可以使用与您的输出相匹配的week = df$week[which(df$wday %in% 1)]
。
答案 4 :(得分:0)
此解决方案受@Andrie和@Chase影响。
# load plyr
library(plyr)
# format weeks as per requirement (replace "00" with "52" and adjust corresponding year)
tmp <- list()
tmp$y <- format(df$date, format="%Y")
tmp$w <- format(df$date, format="%U")
tmp$y[tmp$w=="00"] <- as.character(as.numeric(tmp$y[tmp$w=="00"]) - 1)
tmp$w[tmp$w=="00"] <- "52"
df$week <- paste(tmp$y, tmp$w, sep = "-")
# get summary
df2 <- ddply(df, .(week), summarize, income=sum(income))
# include week ending date
tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week==x, "date"])[[1]])
df2$week.ending <- sapply(tmp$week.ending, as.character)
# week income week.ending
# 1 2010-52 487 2011-01-01
# 2 2011-01 387 2011-01-08
# 3 2011-02 443 2011-01-15
答案 5 :(得分:0)
df.index = df [&#39;周&#39;]#将dt变量作为索引
df.resample(&#39; W&#39;)。sum()#sum使用resample
答案 6 :(得分:0)
使用dplyr:
df %>%
arrange(date) %>%
mutate(week = as.numeric(date - date[1])%/%7) %>%
group_by(week) %>%
summarise(weekincome= sum(income))
取代日期[1],您可以从您希望开始每周学习开始的任何日期。