问题类似于How do I do a conditional sum which only looks between certain date criteria,但略有不同,答案不符合当前问题。主要区别在于基于每个组的日期列可能不一定完整(即某些日期可能丢失)
输入:
input <- read.table(text="
2017-04-01 A 1
2017-04-02 B 2
2017-04-02 B 2
2017-04-02 C 2
2017-04-02 A 2
2017-04-03 C 3
2017-04-04 A 4
2017-04-05 B 5
2017-04-06 C 6
2017-04-07 A 7
2017-04-08 B 8
2017-04-09 C 9")
colnames(input) <- c("Date","Group","Score")
规则:对于每个日期的每个组,回顾3个日历日期(包括当前日期)。计算总和。
预期产出:
Date Group 3DaysSumPerGroup
2017-04-01 A 1 #1 previous two dates are not available. partial is allowed
2017-04-02 A 3 #2+1 both 4-01 and 4-02 are in the range
2017-04-04 A 6 #4+2
2017-04-07 A 7 #7
2017-04-02 B 4 # 2+2 at the same day
2017-04-05 B 5
2017-04-08 B 8
2017-04-02 C 2
2017-04-03 C 5
2017-04-06 C 6
2017-04-09 C 9
我尝试使用rollapply with partial = T,但结果似乎不正确。
input %>%
group_by(Group) %>%
arrange(Date) %>% mutate("3DaysSumPerGroup"=rollapply(data=Score,width=3,align="right",FUN=sum,partial=T,fill=NA,rm.na=T))
答案 0 :(得分:4)
这是一个(假设有效的)解决方案,使用data.table(v1.9.8 +)中新的非equi连接和by = .EACHI
功能
library(data.table) #v1.10.4
## Convert to a proper date class, and add another column in order to define the range
setDT(input)[, c("Date", "Date2") := {
Date = as.IDate(Date)
Date2 = Date - 2L
.(Date, Date2)
}]
## Run a non-equi join against the unique Date/Group combination in input
## Sum the Scores on the fly
## You can ignore the second Date column
input[unique(input, by = c("Date", "Group")), ## This removes the dupes
on = .(Group, Date <= Date, Date >= Date2), ## The join condition
.(Score = sum(Score)), ## sum the scores
keyby = .EACHI] ## Run the sum by each row in unique(input, by = c("Date", "Group"))
# Group Date Date Score
# 1: A 2017-04-01 2017-03-30 1
# 2: A 2017-04-02 2017-03-31 3
# 3: A 2017-04-04 2017-04-02 6
# 4: A 2017-04-07 2017-04-05 7
# 5: B 2017-04-02 2017-03-31 4
# 6: B 2017-04-05 2017-04-03 5
# 7: B 2017-04-08 2017-04-06 8
# 8: C 2017-04-02 2017-03-31 2
# 9: C 2017-04-03 2017-04-01 5
# 10: C 2017-04-06 2017-04-04 6
# 11: C 2017-04-09 2017-04-07 9