按data.table中的季度汇总信息,为

时间:2017-07-02 10:56:38

标签: r data.table aggregate

我有这个data.table,这是聚合更大的一个结果:

data.table(Period = c('2018.01', '2018.02'), sales = c(8850, 7950), qty = c(650, 650))

    Period sales qty
1: 2018.01  8850 650
2: 2018.02  7950 650

我需要实现并且无法实现的方法是按季度汇总信息,结果将是:

data.table(Period = c('2018.01', '2018.02', '2018Q1', '2018'), sales = c(8850, 7950, 16800, 16800), qty = c(650, 650, 1300, 1300))

   Period sales  qty
1: 2018.01  8850  650
2: 2018.02  7950  650
3:  2018Q1 16800 1300
4:    2018 16800 1300

我已尝试过:dt = rbind(dt, dt[, lapply(.SD, sum), by = .(Period), .SDcols = c('sales', 'qty')])

但是我得到了重复的列:

    Period  ums men
1: 2018.01 8850 650
2: 2018.02 7950 650
3: 2018.01 8850 650
4: 2018.02 7950 650

此外,我需要将季度单元格重命名为Q1(Q2,Q3,Q4)为季度,而仅为年份。怎么可能呢?

修改

虽然接受的答案是正确的,但我已经重做了它,所以我不需要添加额外的列也不需要安装新的库:

DT = data.table(Period = c('2018.01', '2018.02'), sales = c(8850, 7950), qty = c(650, 650))

DT$Period = as.double(str_replace(DT$Period, "\\.", ""))
ints      = setInterval(2018)
dt        = DT[, lapply(.SD, sum), by = .(Period = cut(Period, breaks = ints$i, labels = ints$q)), .SDcols = c('sales', 'qty')]
dt        = rbind(dt, dt[Period %in% ints$q, lapply(.SD, sum), by = .(Period = '2018'), .SDcols = c('sales', 'qty')], fill = T)
DT$Period = paste(substr(DT$Period, 1, 4), ".", right(DT$Period, 2), sep = "")
DT        = rbind(DT, dt)

我需要创建这个辅助功能:

setInterval = function (year) {
   y = year * 100
   return (list(
      i = c(y, y + 3, y + 6, y + 9, y + 12),
      q = paste(year, '.', c('Q1', 'Q2', 'Q3', 'Q4'), sep = '')
   ))
}

2 个答案:

答案 0 :(得分:2)

dt <- data.table(Period = c('2018.01', '2018.02'), sales = c(8850, 7950), qty = c(650, 650))
library(zoo)
dt$Period_YQ <- as.character(as.yearqtr(paste(dt$Period, "01", sep="."), "%Y.%m.%d"))
dt$Period_Y <- strtrim(dt$Period, 4)

dt1 <- dt[,.SD,.SDcols=c(1:3)]
dt2 <- dt[,lapply(.SD,sum), by="Period_YQ", .SDcols = c('sales', 'qty')]
colnames(dt2) <- c('Period','sales', 'qty')
dt3 <- dt[,lapply(.SD,sum), by="Period_Y", .SDcols = c('sales', 'qty')]
colnames(dt3) <- c('Period','sales', 'qty')
rbind(dt1,dt2,dt3)

希望这有帮助!

答案 1 :(得分:0)

使用lubridatedplyr的类似但不同的方法:

将您的Period转换为DATE格式。我喜欢使用lubridate::parse_date_time。请注意,我还为每个YearQuarter

创建了新列
library(lubridate)
df <- df %>% 
      mutate(Period = parse_date_time(Period, "ym")) %>%
      mutate(Year = year(Period)) %>% 
      mutate(Quarter = quarter(Period))

然后分别计算YearlyQuarterly总和:

Yearly <- df %>% 
          group_by(Year) %>%
          summarise(Y.sales = sum(sales), Y.qty = sum(qty))

Quarterly <- df %>%
             group_by(Year, Quarter) %>%
             summarise(Q.sales = sum(sales), Q.qty = sum(qty))

最后,您使用full_join组合所有数据:

final <- full_join(Yearly, Quarterly, by=c("Year")) %>% 
         full_join(., df, by=c("Year","Quarter"))

这为您提供了一个更整洁的( IMO )输出,由YearQuarter,然后Period组织:

   Year Y.sales Y.qty Quarter Q.sales Q.qty     Period sales   qty
  <dbl>   <dbl> <dbl>   <int>   <dbl> <dbl>     <dttm> <dbl> <dbl>
1  2018   16800  1300       1   16800  1300 2018-01-01  8850   650
2  2018   16800  1300       1   16800  1300 2018-02-01  7950   650