我需要创建一个列,其中每个观察值等于前一个观察值乘以另一列中的观察值再加上1。我试图在下面的示例中创建indx列。 indx [1]的硬编码为1.000,但是indx [2] = indx [1] *(1 + chng [2])。
我一直在dplyr库中使用mutate来创建新列,但是在创建列时,我看不到如何引用该列的先前值。
编辑:更新了以下示例,以反映每5次观察后i和chng的数据值分别重置为0和0.000,并且在发生这种情况时indx也需要重置为1.000并重新开始累积
示例数据表:
test <- data.frame(i = c(0,1,2,3,4,0,1,2,3,4)
,chng = c(.000,.031,.005,-.005,.017,.000,.012,.003,-.013,-.005,)
,indx = c(1,1.031,1.037,1.031,1.048,1,1.012,1.015,1.002,.997))
i chng indx
1: 0 0.000 1.000
2: 1 0.031 1.031
3: 2 0.005 1.037
4: 3 -0.005 1.031
5: 4 0.017 1.048
6: 0 0.000 1.000
7: 1 0.012 1.012
8: 2 0.003 1.015
9: 3 -0.013 1.002
10: 4 -0.005 0.997
答案 0 :(得分:2)
从数学上讲,这与cumprod(test$chng + 1)
相同:
test %>% mutate(indx = cumprod(chng + 1))
给予:
i chng indx
1 0 0.000 1.000000
2 1 0.031 1.031000
3 2 0.005 1.036155
4 3 -0.005 1.030974
5 4 0.017 1.048501
关于更新后的问题,创建一个分组变量g
并按组应用以上变量:
test %>%
group_by(g = cumsum(i == 0)) %>%
mutate(indx = cumprod(chng + 1)) %>%
ungroup %>%
select(-g)
答案 1 :(得分:0)
如前所述,您需要创建的列/变量中的上一个值。这是一个顺序过程,一个选择是使用Reduce
(而不是for循环):
test <- data.frame(i = c(0:4)
,chng = c(.000,.031,.005,-.005,.017))
test$indx = Reduce(function(x,y) x*(1+y), test$chng, accumulate = T, init = 1)[-1]
test
# i chng indx
# 1 0 0.000 1.000000
# 2 1 0.031 1.031000
# 3 2 0.005 1.036155
# 4 3 -0.005 1.030974
# 5 4 0.017 1.048501
对于i
重置的情况,您可以使用以下方法:
test <- data.frame(i = c(0,1,2,3,4,0,1,2,3,4)
,chng = c(.000,.031,.005,-.005,.017,.000,.012,.003,-.013,-.005))
library(tidyverse)
test %>%
group_by(group = cumsum(i == 0)) %>% # create a group based on i column
mutate(indx = Reduce(function(x,y) x*(1+y), chng, accumulate = T, init = 1)[-1]) %>% # apply the Reduce function to each group
ungroup() %>% # forget the grouping
select(-group) %>% # remove group column
data.frame() # only for visualisation purposes (see the decimals)
# i chng indx
# 1 0 0.000 1.0000000
# 2 1 0.031 1.0310000
# 3 2 0.005 1.0361550
# 4 3 -0.005 1.0309742
# 5 4 0.017 1.0485008
# 6 0 0.000 1.0000000
# 7 1 0.012 1.0120000
# 8 2 0.003 1.0150360
# 9 3 -0.013 1.0018405
# 10 4 -0.005 0.9968313