计算R

时间:2018-01-17 06:11:06

标签: r dplyr data.table

我想在R中的数据表dt中找到变量的运行总和,并希望返回此运行总和大于或等于另一列中提到的阈值的月份小组,

library(data.table)
dt <- data.table(pno = c("A","A","A","A","A","A","A","B","B", "B", "C", "C" ), 
                 month = c("Jan","Feb", "Mar", "Apr", "May", "Jun","Jul", "Jun", "Jul", "Aug", "Mar", "Apr"),
                 x = c(1,2,1,3,2,4,1,3,4,2,4,2),
                 min_x_reqd = c(5,5,5,5,5,5,5,3,3,3,4,4),
                 min_mon = c(4,4,4,4,4,4,4,3,3,3,2,2))

data.table dt看起来像:

dt
    pno month x min_x_reqd min_mon
 1:   A   Jan 1          5       4
 2:   A   Feb 2          5       4
 3:   A   Mar 1          5       4
 4:   A   Apr 3          5       4
 5:   A   May 2          5       4
 6:   A   Jun 4          5       4
 7:   A   Jul 1          5       4
 8:   B   Jun 3          3       3
 9:   B   Jul 4          3       3
10:   B   Aug 2          3       3
11:   C   Mar 4          4       2
12:   C   Apr 2          4       2

例如:根据上面的数据,我想计算x中提到的移动窗口的每个pno的{​​{1}}总和。所以在这个总和大于或等于min_mon中提到的阈值的任何时候,我想返回该窗口的第一个月,其中它满足条件。

所以在我们的例子中,根据数据,我的输出应该是:

min_x_reqd

如何使用data.table / dataframe执行此操作。

1 个答案:

答案 0 :(得分:2)

我们可以使用roll_sum中的RcppRoll来计算滚动金额,然后根据逻辑条件计算第一个月的子集。满足每个&#39; pno&#39;

的条件
library(RcppRoll)
library(data.table)
dt[, .(month = month[which(roll_sum(x, min_mon[1], 
             fill = 0, align = "left") > min_x_reqd)[1]]), by = pno]
#   pno month
#1:   A   Jan
#2:   B   Jun
#3:   C   Mar