Question

此问题类似于dplyr/ R cumulative sum with reset，该问题要求一种基于阈值重置累积总和的方法。该问题的公认答案是使用固定阈值来重置累加的函数。

library(tidyverse)

sum_reset_at <- function(thresh) {
    function(x) {
        accumulate(x, ~if_else(.x >= thresh, .y, .x + .y))
    }
}

df <- tibble(a = c(2, 3, 1, 2, 2, 3))

df %>% mutate(c = sum_reset_at(5)(a))

## # A tibble: 6 x 2
##       a     c
##   <dbl> <dbl>
## 1     2     2
## 2     3     5
## 3     1     1
## 4     2     3
## 5     2     5
## 6     3     3

当累积量达到（或超过）阈值时，它将从下一条记录中的a的值重新开始。

我不想提供固定的阈值，而是要提供一个阈值向量，该阈值将被顺序访问，并随着每次重置而递增：

thresholds <- c(5, 3, 2)

df %>% mutate(c = sum_reset_at(thresholds)(a))

## # A tibble: 6 x 2
##       a     c
##   <dbl> <dbl>
## 1     2     2
## 2     3     5
## 3     1     1
## 4     2     3
## 5     2     2
## 6     3     3

该载体将根据需要回收。

在函数中使用sample可以使我工作：

set.seed(0)

sum_reset_at <- function(thresh) {
    function(x) {
        accumulate(x, ~if_else(.x >= sample(thresh, size = 1), .y, .x + .y))
    }
}

thresholds <- c(5, 3, 2)

df %>% mutate(c = sum_reset_at(thresholds)(a))

## # A tibble: 6 x 2
##       a     c
##   <dbl> <dbl>
## 1     2     2
## 2     3     3
## 3     1     4
## 4     2     2
## 5     2     4
## 6     3     3

但是我不想随机采样阈值，而是要顺序采样它们。

Answer 1

您可以修改sum_reset_at以接受thres的向量：

sum_reset_at <- function(thresh)
  {
    function(x) {
      i <- 1
      accumulate(x, function(.x, .y) {
        if(.x >= thresh[i])
        {
          #Increment i and return .y
          i <<- i+1
          if (i > length(thresh)) i <<- 1
          .y
        }
        else
        {
          .x + .y
        }
      })
    }
 }

df <- tibble(a = c(2, 3, 1, 2, 2, 3))

df %>% mutate(c = sum_reset_at(c(5,3,1))(a))
## A tibble: 6 x 2
#      a     c
#  <dbl> <dbl>
#1     2     2
#2     3     5
#3     1     1
#4     2     3
#5     2     5
#6     3     3

使用多个顺序访问的阈值进行重置的累积总和

1 个答案: