Question

我需要创建一个列，其中每个观察值等于前一个观察值乘以另一列中的观察值再加上1。我试图在下面的示例中创建indx列。 indx [1]的硬编码为1.000，但是indx [2] = indx [1] *（1 + chng [2]）。

我一直在dplyr库中使用mutate来创建新列，但是在创建列时，我看不到如何引用该列的先前值。

编辑：更新了以下示例，以反映每5次观察后i和chng的数据值分别重置为0和0.000，并且在发生这种情况时indx也需要重置为1.000并重新开始累积

示例数据表：

test <- data.frame(i = c(0,1,2,3,4,0,1,2,3,4)
               ,chng = c(.000,.031,.005,-.005,.017,.000,.012,.003,-.013,-.005,)
               ,indx = c(1,1.031,1.037,1.031,1.048,1,1.012,1.015,1.002,.997))

     i   chng  indx
 1:  0  0.000 1.000
 2:  1  0.031 1.031
 3:  2  0.005 1.037
 4:  3 -0.005 1.031
 5:  4  0.017 1.048
 6:  0  0.000 1.000
 7:  1  0.012 1.012
 8:  2  0.003 1.015
 9:  3 -0.013 1.002
10:  4 -0.005 0.997

Answer 1

从数学上讲，这与cumprod(test$chng + 1)相同：

test %>% mutate(indx = cumprod(chng + 1))

给予：

  i   chng     indx
1 0  0.000 1.000000
2 1  0.031 1.031000
3 2  0.005 1.036155
4 3 -0.005 1.030974
5 4  0.017 1.048501

关于更新后的问题，创建一个分组变量g并按组应用以上变量：

test %>%
  group_by(g = cumsum(i == 0)) %>%
  mutate(indx = cumprod(chng + 1)) %>%
  ungroup %>%
  select(-g)

Answer 2

如前所述，您需要创建的列/变量中的上一个值。这是一个顺序过程，一个选择是使用Reduce（而不是for循环）：

test <- data.frame(i = c(0:4)
                   ,chng = c(.000,.031,.005,-.005,.017))

test$indx = Reduce(function(x,y) x*(1+y), test$chng, accumulate = T, init = 1)[-1]

test

#   i   chng     indx
# 1 0  0.000 1.000000
# 2 1  0.031 1.031000
# 3 2  0.005 1.036155
# 4 3 -0.005 1.030974
# 5 4  0.017 1.048501

对于i重置的情况，您可以使用以下方法：

test <- data.frame(i = c(0,1,2,3,4,0,1,2,3,4)
                   ,chng = c(.000,.031,.005,-.005,.017,.000,.012,.003,-.013,-.005))

library(tidyverse)

test %>%
  group_by(group = cumsum(i == 0)) %>%   # create a group based on i column
  mutate(indx = Reduce(function(x,y) x*(1+y), chng, accumulate = T, init = 1)[-1]) %>%  # apply the Reduce function to each group
  ungroup() %>%                          # forget the grouping
  select(-group) %>%                     # remove group column
  data.frame()                           # only for visualisation purposes (see the decimals)

#    i   chng      indx
# 1  0  0.000 1.0000000
# 2  1  0.031 1.0310000
# 3  2  0.005 1.0361550
# 4  3 -0.005 1.0309742
# 5  4  0.017 1.0485008
# 6  0  0.000 1.0000000
# 7  1  0.012 1.0120000
# 8  2  0.003 1.0150360
# 9  3 -0.013 1.0018405
# 10 4 -0.005 0.9968313

创建列时引用上一个列的值

2 个答案: