按R中的定义周汇总一列

时间:2017-10-25 17:06:40

标签: r aggregate

我有下表,我需要根据下面给定月份的定义周来汇总4到5列。

for example for any given month my weekly definition for purchase date column as follows:

wk1: 1-6 days
wk2: 7-12 days
wk3: 13-18 days
wk4: 19-24 days
wk5: 25-31 days

Year    County   purchase_Date    acres_purchase  Date_Diff   
2010    Cache    9/28/2009        30.5                 1
2010    Cache    10/1/2009        5.0                  4
2010    Cache    10/3/2009        10.2                 3
2010    Cache    10/5/2009        20                   3
2010    Cache    10/7/2009        15                   5 
2010    Cache    10/13/2009       5                    1 
2010    Cache    10/14/2009       6                    2
2010    Cache    10/19/2009       25                   7
2010    Cache    10/25/2009       12                   3
2010    Cache    10/30/2009       2                    1


Output:

    Year    County   purchase_Date  Week          purchase_by_date  Date_Diff   
    2010    Cache    9/28/2009    Sep-wk5          30.5                 1
    2010    Cache    10/1/2009    Oct-wk1          35.2                 10
    2010    Cache    10/7/2009    Oct-wk2          15                   5
    2010    Cache    10/13/2009   Oct-wk3          11                   3
    2010    Cache    10/19/2009   Oct-wk4          25                   7
    2010    Cache    10/25/2009   Oct-wk5          14                   4

有没有办法可以实现"输出" R中的表格?

感谢任何帮助。

2 个答案:

答案 0 :(得分:1)

首先将purchase_Date转换为日期类,然后提取purchase_Day

df1$purchase_Date <- as.Date(df1$purchase_Date, format= "%m/%d/%Y")

df1$purchase_Day <- as.numeric(format(df1$purchase_Date, "%d"))

定义辅助函数,将每个天数范围分配给正确的星期。

weekGroup <- function(x){
  if (x <= 6) {
     week <- "wk1"
  } else if (x <= 12) {
     week <- "wk2"
  } else if (x <= 18) {
     week <- "wk3"
  } else if (x <= 24) {
     week <- "wk4"
  } else {
     week <-"wk5"
  }
  return(week)
}

每天通过我们的助手功能:

df1$week <- sapply(df1$purchase_Day, weekGroup)

将月份拉入单独的列,然后转换为数字

df1$month <- as.numeric(format(df1$purchase_Date, "%m"))

month.abb是月份缩写的列表。使用数字月份来调用相应的列表元素

df1$monthAbb <- sapply(df1$month, function(x) month.abb[x])

合并weekmonthAbb

df1$monthWeek <- paste(df1$monthAbb,df1$week, sep="-")

@cmaher基本上已经提供了这个,但为了完整性,最后的总结:

require(dplyr)

df1 %>% group_by(Year, County,monthWeek) %>%
 summarise(purchaseDate=min(purchase_Date),acres=sum(acres_purchase),
 date_diff=sum(Date_Diff))


  Year County monthWeek purchaseDate acres date_diff
  <int> <fctr>     <chr>       <date> <dbl>     <int>
1  2010  Cache   Oct-wk1   2009-10-01  35.2        10
2  2010  Cache   Oct-wk2   2009-10-07  15.0         5
3  2010  Cache   Oct-wk3   2009-10-13  11.0         3
4  2010  Cache   Oct-wk4   2009-10-19  25.0         7
5  2010  Cache   Oct-wk5   2009-10-25  14.0         4
6  2010  Cache   Sep-wk5   2009-09-28  30.5         1

答案 1 :(得分:0)

假设您的purchase_Date变量属于Date类,您可以使用lubridate::day()base::findInterval来划分日期:

df$Week <- findInterval(lubridate::day(df$purchase_Date), c(7, 13, 19, 25, 32)) + 1
df$Week <- as.factor(paste(lubridate::month(df$purchase_Date), df$Week, sep = "-"))
#    purchase_Date Week
#    2017-10-01    10-1
#    2017-10-02    10-1
#    2017-10-03    10-1
#    ...
#    2017-10-29    10-5
#    2017-10-30    10-5
#    2017-10-31    10-5

然后,实现目标输出的一种方法是使用dplyr,如下所示:

df %>% group_by(Year, Country, Week) %>% 
  summarize(
    purchase_Date = min(purchase_Date), 
    purchase_by_date = sum(acres_purchase),
    Date_Diff = sum(Date_Diff))