我有一个数据集
dt <- data.table(Customer = c("a", "a", "c"), months = c(24, 12, 37), Date = c("2019-02-23","2019-03-31","2019-10-01"), Cost = c("100","200","370"))
我希望按年细分成本,并按行号回头客
months_to_year <- function(months){
if(months%%12==0) y <- rep(12, months %/% 12) else y <- c(rep(12, months %/% 12), months %% 12)
return(y)
}
dt$years<- dt$months/12
dt$Cost <- as.numeric(dt$Cost)
dt<- dt %>% mutate(Date = as.Date(Date), rn = row_number()) %>%
slice(rep(rn, ceiling(months/12)))%>%
group_by(Customer, rn) %>%
mutate(months1 = months_to_year(first(months)),
Date = seq(first(Date), by="1 year", length.out=n()),
Cost = Cost/months * months1)
我得到以下输出
Customer months Date Cost years rn months1
<chr> <dbl> <date> <dbl> <dbl> <int> <dbl>
1 a 24 2019-02-23 50 2 1 12
2 a 24 2020-02-23 50 2 1 12
3 a 12 2019-03-31 200 1 2 12
4 c 37 2019-10-01 120 3.08 3 12
5 c 37 2020-10-01 120 3.08 3 12
6 c 37 2021-10-01 120 3.08 3 12
7 c 37 2022-10-01 10 3.08 3 1
现在我希望进一步按月细分
dt %>% mutate(Date = as.Date(Date), rn1 = row_number()) %>%
slice(rep(rn1, months1))%>%
group_by(Customer, rn1) %>%
mutate(New.Date = seq(first(Date), by="1 month", length.out=n()))
但是,第3行中的客户“ a”被索引为rn1 = 1且新的开始日期比客户“ a”的前一个rn = 1索引增加了1个月。请参阅“新日期”列的第12和25行...我希望在第25行中获取新的日期,以开始2019-03-31。
[![dt output][1]][1]
我将非常感谢您的帮助。
谢谢。
答案 0 :(得分:0)
这可能有效。我对示例数据进行了一些编辑,因此请获得正确类型的列。
library(data.table)
dt <- data.table(Customer = c("a", "a", "c"), months_num = c(24, 12, 37), Date = c("2019-02-23","2019-03-31","2019-10-01"), Cost = c(100,200,370))
#set dates
dt[, Date := as.POSIXct( Date, format = "%Y-%m-%d" ) ]
dt[, EndDate := Date %m+% months( months_num )][]
str(dt)
# Classes ‘data.table’ and 'data.frame': 3 obs. of 5 variables:
# $ Customer : chr "a" "a" "c"
# $ months_num: num 24 12 37
# $ Date : POSIXct, format: "2019-02-23" "2019-03-31" "2019-10-01"
# $ Cost : num 100 200 370
# $ EndDate : POSIXct, format: "2021-02-23" "2020-03-31" "2022-11-01"
代码
实际上是单线的,但是由于在SO上的易读性而溢出了多条线。
#monthly
dt[ , .( Customer = Customer,
month = seq(Date,
by = "month",
length.out = months_num ),
Cost = Cost / months_num ),
by = .(id = 1:nrow(dt) )][]