我有下表,我需要根据下面给定月份的定义周来汇总4到5列。
for example for any given month my weekly definition for purchase date column as follows:
wk1: 1-6 days
wk2: 7-12 days
wk3: 13-18 days
wk4: 19-24 days
wk5: 25-31 days
Year County purchase_Date acres_purchase Date_Diff
2010 Cache 9/28/2009 30.5 1
2010 Cache 10/1/2009 5.0 4
2010 Cache 10/3/2009 10.2 3
2010 Cache 10/5/2009 20 3
2010 Cache 10/7/2009 15 5
2010 Cache 10/13/2009 5 1
2010 Cache 10/14/2009 6 2
2010 Cache 10/19/2009 25 7
2010 Cache 10/25/2009 12 3
2010 Cache 10/30/2009 2 1
Output:
Year County purchase_Date Week purchase_by_date Date_Diff
2010 Cache 9/28/2009 Sep-wk5 30.5 1
2010 Cache 10/1/2009 Oct-wk1 35.2 10
2010 Cache 10/7/2009 Oct-wk2 15 5
2010 Cache 10/13/2009 Oct-wk3 11 3
2010 Cache 10/19/2009 Oct-wk4 25 7
2010 Cache 10/25/2009 Oct-wk5 14 4
有没有办法可以实现"输出" R中的表格?
感谢任何帮助。
答案 0 :(得分:1)
首先将purchase_Date
转换为日期类,然后提取purchase_Day
:
df1$purchase_Date <- as.Date(df1$purchase_Date, format= "%m/%d/%Y")
df1$purchase_Day <- as.numeric(format(df1$purchase_Date, "%d"))
定义辅助函数,将每个天数范围分配给正确的星期。
weekGroup <- function(x){
if (x <= 6) {
week <- "wk1"
} else if (x <= 12) {
week <- "wk2"
} else if (x <= 18) {
week <- "wk3"
} else if (x <= 24) {
week <- "wk4"
} else {
week <-"wk5"
}
return(week)
}
每天通过我们的助手功能:
df1$week <- sapply(df1$purchase_Day, weekGroup)
将月份拉入单独的列,然后转换为数字
df1$month <- as.numeric(format(df1$purchase_Date, "%m"))
month.abb
是月份缩写的列表。使用数字月份来调用相应的列表元素
df1$monthAbb <- sapply(df1$month, function(x) month.abb[x])
合并week
和monthAbb
df1$monthWeek <- paste(df1$monthAbb,df1$week, sep="-")
@cmaher基本上已经提供了这个,但为了完整性,最后的总结:
require(dplyr)
df1 %>% group_by(Year, County,monthWeek) %>%
summarise(purchaseDate=min(purchase_Date),acres=sum(acres_purchase),
date_diff=sum(Date_Diff))
Year County monthWeek purchaseDate acres date_diff
<int> <fctr> <chr> <date> <dbl> <int>
1 2010 Cache Oct-wk1 2009-10-01 35.2 10
2 2010 Cache Oct-wk2 2009-10-07 15.0 5
3 2010 Cache Oct-wk3 2009-10-13 11.0 3
4 2010 Cache Oct-wk4 2009-10-19 25.0 7
5 2010 Cache Oct-wk5 2009-10-25 14.0 4
6 2010 Cache Sep-wk5 2009-09-28 30.5 1
答案 1 :(得分:0)
假设您的purchase_Date
变量属于Date
类,您可以使用lubridate::day()
和base::findInterval
来划分日期:
df$Week <- findInterval(lubridate::day(df$purchase_Date), c(7, 13, 19, 25, 32)) + 1
df$Week <- as.factor(paste(lubridate::month(df$purchase_Date), df$Week, sep = "-"))
# purchase_Date Week
# 2017-10-01 10-1
# 2017-10-02 10-1
# 2017-10-03 10-1
# ...
# 2017-10-29 10-5
# 2017-10-30 10-5
# 2017-10-31 10-5
然后,实现目标输出的一种方法是使用dplyr,如下所示:
df %>% group_by(Year, Country, Week) %>%
summarize(
purchase_Date = min(purchase_Date),
purchase_by_date = sum(acres_purchase),
Date_Diff = sum(Date_Diff))