所以这是我的数据的一个例子
> d
customer date revenue
1: A 2016-01-01 32
2: A 2016-01-03 88
3: A 2016-01-04 80
4: A 2016-02-01 38
5: B 2016-01-13 44
6: B 2016-01-24 11
7: B 2016-01-25 50
8: B 2016-02-26 46
> dput(d)
structure(list(customer = c("A", "A", "A", "A", "B", "B", "B",
"B"), date = structure(c(16801, 16803, 16804, 16832, 16813, 16824,
16825, 16857), class = "Date"), revenue = c(32, 88, 80, 38, 44,
11, 50, 46)), .Names = c("customer", "date", "revenue"), row.names = c(NA,
-8L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000002a60788>)
我想要做的是,我想创建一个列,我们称之为roll_sum_3days。 此列是之后发生的收入的滚动总和。窗口大小以日期列为条件。在这种情况下,roll_sum_3days是之后发生的收入总和,不得晚于3天。
预期的结果将是这样的
customer date revenue roll_sum_3days
1: A 2016-01-01 32 168
2: A 2016-01-03 88 80
3: A 2016-01-04 80 0
4: A 2016-02-01 38 0
5: B 2016-01-13 44 0
6: B 2016-01-24 11 96
7: B 2016-01-25 50 46
8: B 2016-01-26 46 0
答案 0 :(得分:3)
可能的解决方案:
library(lubridate) # for the '%m+%'-function
d[, roll_sum_3d := .SD[.SD[, .(date, date2 = date %m+% days(3), revenue)]
, on = .(date > date, date <= date2)
][, sum(revenue, na.rm = TRUE), by = date]$V1
, by = customer][]
给出:
customer date revenue roll_sum_3d 1: A 2016-01-01 32 168 2: A 2016-01-03 88 80 3: A 2016-01-04 80 0 4: A 2016-02-01 38 0 5: B 2016-01-13 44 0 6: B 2016-01-24 11 96 7: B 2016-01-25 50 46 8: B 2016-01-26 46 0
这是做什么的:
d
分组customer with
by = customer`。roll_sum_3d
添加:=
。roll_sum_3d
( S ubset D ata)计算.SD
({ {1}}使用非等值加入.SD[, .(date, date2 = date %m+% days(3), revenue)]
,汇总每个日期的收入并将其返回。基于@ Arun评论的另一种选择:
on = .(date > date, date <= date2)
答案 1 :(得分:1)
嗨,我猜你的例子中还有另一个错误:观察数字8不会增加前两次观察的计数,因为它来自二月。没关系如果你想使用(change)="change($event)"
和apply()
函数
POSIXct()
我无法保留您的日期格式,因为操作员df <- data.frame(customer = c("A", "A", "A", "A", "B", "B", "B", "B"),
date = structure(c(16801, 16803, 16804, 16832, 16813, 16824,
16825, 16857), class = "Date"),
revenue = c(32, 88, 80, 38, 44, 11, 50, 46))
df$date <- as.POSIXct(df$date)
calc <- function(x){
date <- as.POSIXct(unlist(x["date"]),origin = "1970-01-01")
customer <- unlist(x["customer"])
# There you choose what you want to sum (here conditions are between the day and 3 days later and same customer)
# 86400 is the number of second in a day!
output <- sum(df[df$date > date & df$date <= (date+86400*3) & df$customer==customer,"revenue"])
return(output)
}
df$sum <- apply(df,1,calc)
# if you want to come back with your date format.
df$date <- as.Date(df$date)
df
customer date revenue sum
1 A 2016-01-01 32 168
2 A 2016-01-03 88 80
3 A 2016-01-04 80 0
4 A 2016-02-01 38 0
5 B 2016-01-13 44 0
6 B 2016-01-24 11 50
7 B 2016-01-25 50 0
8 B 2016-02-26 46 0
无法使用它。