基于R

时间:2018-02-15 03:21:32

标签: r dplyr data.table

我的问题受到Cumulative sum in a window (or running window sum) based on a condition in R的启发。

我想计算一下运行窗口总和就像在上面的帖子中稍微扭曲一样。即使不存在满足过滤条件的行,我也希望“累计”累积和的值直到“k”年。换句话说,我们需要在原始数据集中添加行。

这个问题很有挑战性,因为我仍然不习惯在apply内使用data.table功能。

这是我的输入数据:

DFI <- structure(list(Year = c(2011, 2013, 2014, 2010, 2012, 2015), 
    Customer = c(13575, 13575, 13575, 13575, 13576, 13576), Product = c("R", 
    "R", "R", "W", "S", "R"), Rev = c(4, 1, 2, 1, 2, 2)), .Names = c("Year", 
"Customer", "Product", "Rev"), row.names = c(NA, -6L), class = "data.frame")

这是我的预期输出:

DFO <- structure(list(Year = c(2011, 2012, 2013, 2014, 2015, 2010, 2011, 
2015, 2012, 2013), Customer = c(13575, 13575, 13575, 13575, 13575, 
13575, 13575, 13576, 13576, 13576), Product = c("R", "R", "R", 
"R", "R", "W", "W", "R", "S", "S"), Rev = c(4, 0, 1, 2, 0, 1, 
0, 2, 2, 0), CumRev = c(4, 4, 1, 3, 2, 1, 1, 2, 2, 2)), .Names = c("Year", 
"Customer", "Product", "Rev", "CumRev"), class = "data.frame", row.names = c(NA, 
-10L))

关于我如何手动生成DFO

的一些评论

a)窗口中的年数= 2,即k=2

b)尽管Year = 2012({1}}({1}}(输入数据)中不存在Customer = 13575Product = RDFI的条目,但由于来自{的累积总和,因此添加了该条目{1}}将继续推进一年(即Year = 2011)。因此,对于此行,k-1 = 2-1 = 1Rev = 0

c)CumRev = 4The entry for Year = 2015Customer = 13575已添加,因为Product = R表格中至少存在一个条目。换句话说,要添加(或继续)的Year = 2015范围取决于两件事:1)输入表2中Year的范围。运行窗口的长度。

现在,我确实在发布之前尝试自己解决这个问题。我已经花了将近36个小时,我能够解决这个问题。但问题是Year在实际数据中,我的内存不足。因此,我想知道是否有更好的方法(计算成本更低,内存效率更高)来解决这个问题。

这是我的代码:

expand.grid.

作为刚刚开始学习在Year<-unique(DFI$Year) Customer<-unique(DFI$Customer) Product<-unique(DFI$Product) DFO1<-expand.grid(Year = Year,Customer = Customer,Product = Product) #generate all combinations DFO1<-data.table::as.data.table(DFO1) #Do join between DFO and DFI to add Rev DFO1<-DFI[DFO1,on=c("Product","Customer","Year")] k<-2 #Number of years = 2 DFO1<-DFO1[order(Customer,Product,Year)] DFO1[is.na(Rev)]$Rev<-0 DFO1<-DFO1[, CumRev := sapply(Year, function(year) sum(Rev[between(Year, year-k+1, year)])), by = .(Customer, Product)][order(Customer,Product,Year)] DFO1<-DFO1[CumRev!=0] #Remove zero rows DFO<-data.table::as.data.table(DFO) DFO<-DFO[order(Customer,Product,Year)] compare(DFO1,DFO) #TRUE 中应用apply()的人,这对我来说很难。我很感激任何想法来优化这一点。我愿意从这个过程中学习。感谢您的时间和任何帮助。

1 个答案:

答案 0 :(得分:1)

内联说明。使用@ G.Grothendieck的Sum函数以及他zoo::rollapplyr来自Cumulative sum in a window (or running window sum) based on a condition in R

的应用
k <- 2
Sum <- function(x) {
    x <- matrix(x,, 2)
    FY <- x[, 1]
    Rev <- x[, 2]
    ok <- FY >= tail(FY, 1) - k + 1
    sum(Rev[ok])
}    


setDT(DFI)
#This is prob the only difference from your solution
#create a combination of year to year + k for each Customer and product.
#Then subset to remove future years
combis <- unique(rbindlist(lapply(seq_len(k), 
    function(n) unique(DFI[, .(Year=Year+n-1, Customer, Product)]))))[
        Year <= DFI[,max(Year)]]

#lookup revenue
out <- DFI[combis, on=.(Year, Customer, Product)][,
    Rev := ifelse(is.na(Rev), 0, Rev)]

#order before summing
setorder(out, Customer,Product,Year)
out[,CumRev := zoo::rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
    by = c("Customer", "Product"), .SDcols = c("Year", "Rev")][]