增量日期基于原始开始日期

时间:2019-03-15 17:27:58

标签: r dplyr

我有一个数据集

dt <- data.table(Customer = c("a", "a","b","b"), months = c(2,2,2,3), Date = c("2014-03-1","2015-10-1","2015-01-1","2016-01-1"), Cost = c("100","200","50","20"))
   Customer months      Date Cost
1:        a      2 2014-03-1  100
2:        a      2 2015-10-1  200
3:        b      2 2015-01-1   50
4:        b      3 2016-01-1   20

我希望按月数重复每一行

dt %>% mutate(New.Date.month = as.Date(Date), rn1 = row_number()) %>% 
  slice(rep(rn1, months))%>%
  group_by(Customer, rn1) %>%
  mutate(New.Date.month = seq(first(Date), by="1 month", length.out=n()))
  Customer months Date       Cost  New.Date.month   rn1
  <chr>     <dbl> <date>     <chr> <date>         <int>
1 a             2 2014-03-01 100   2014-03-01         1
2 a             2 2014-03-01 100   2014-04-01         1
3 a             2 2015-10-01 200   2015-10-01         2
4 a             2 2015-10-01 200   2015-11-01         2
5 b             2 2015-01-01 50    2015-01-01         3
6 b             2 2015-01-01 50    2015-02-01         3
7 b             3 2016-01-01 20    2016-01-01         4
8 b             3 2016-01-01 20    2016-02-01         4
9 b             3 2016-01-01 20    2016-03-01         4
> 

但是,我希望对客户进行分组,并将“ New.Date.Month”增加1个月的增量...所以我想要的输出看起来像

  Customer months Date       Cost  New.Date.month   rn1
  <chr>     <dbl> <date>     <chr> <date>         <int>
1 a             2 2014-03-01 100   2014-03-01         1
2 a             2 2014-03-01 100   2014-04-01         1
3 a             2 2015-10-01 200   2014-05-01         2
4 a             2 2015-10-01 200   2014-06-01         2
5 b             2 2015-01-01 50    2015-01-01         3
6 b             2 2015-01-01 50    2015-02-01         3
7 b             3 2016-01-01 20    2015-03-01         4
8 b             3 2016-01-01 20    2015-04-01         4
9 b             3 2016-01-01 20    2015-05-01         4

我将非常感谢您的帮助。

谢谢。

1 个答案:

答案 0 :(得分:2)

我们需要从group_by步骤中删除“ rn1”

library(dplyr)
dt %>% 
   mutate(New.Date.month = as.Date(Date), rn1 = row_number()) %>% 
   slice(rep(rn1, months))%>%
   group_by(Customer) %>% 
   mutate(New.Date.month = seq(first(New.Date.month), by="1 month", length.out=n()))
# A tibble: 9 x 6
# Groups:   Customer [2]
#  Customer months Date      Cost  New.Date.month   rn1
#  <chr>     <dbl> <chr>     <chr> <date>         <int>
#1 a             2 2014-03-1 100   2014-03-01         1
#2 a             2 2014-03-1 100   2014-04-01         1
#3 a             2 2015-10-1 200   2014-05-01         2
#4 a             2 2015-10-1 200   2014-06-01         2
#5 b             2 2015-01-1 50    2015-01-01         3
#6 b             2 2015-01-1 50    2015-02-01         3
#7 b             3 2016-01-1 20    2015-03-01         4
#8 b             3 2016-01-1 20    2015-04-01         4
#9 b             3 2016-01-1 20    2015-05-01         4

可以用uncount简化(无需创建“ rn1”列)

library(tidyr)
dt %>% 
  uncount(months) %>% 
  group_by(Customer) %>%
  mutate(New.Date.month = seq(as.Date(first(Date)),
             by = "1 month", length.out = n()))
# A tibble: 9 x 4
# Groups:   Customer [2]
#  Customer Date      Cost  New.Date.month
#  <chr>    <chr>     <chr> <date>        
#1 a        2014-03-1 100   2014-03-01    
#2 a        2014-03-1 100   2014-04-01    
#3 a        2015-10-1 200   2014-05-01    
#4 a        2015-10-1 200   2014-06-01    
#5 b        2015-01-1 50    2015-01-01    
#6 b        2015-01-1 50    2015-02-01    
#7 b        2016-01-1 20    2015-03-01    
#8 b        2016-01-1 20    2015-04-01    
#9 b        2016-01-1 20    2015-05-01    

此外,由于初始数据集为data.table,我们也可以使用data.table方法

library(data.table)
dt[rep(seq_len(.N), months)][,  New.Date.month := seq(as.Date(Date)[1],
     by = "1 month", length.out = .N), Customer][]