我有一个增长率数据集,想要创建一个链指数,将基准年的值设置为100。我面临的问题是该过程是迭代的。对于t中的值,我需要将t-1中的链索引乘以t中的(1 + growth_rate),以某种方式我无法对data.table中的每个组执行此操作。
考虑此示例数据集。
library(data.table)
x1 <- c(NA, runif(9))
x2 <- c(NA, runif(9))
DT <- data.table(
time = rep(1:10, 2),
growth_rate = c(x1, x2),
idx = c(rep("group1",10),
rep("group2",10))
)
time growth_rate idx
1: 1 NA group1
2: 2 0.82593921 group1
3: 3 0.48084893 group1
4: 4 0.65483959 group1
5: 5 0.87944148 group1
6: 6 0.78886104 group1
7: 7 0.87714854 group1
8: 8 0.87268452 group1
9: 9 0.93289483 group1
10: 10 0.05558125 group1
11: 1 NA group2
12: 2 0.36341183 group2
13: 3 0.21488630 group2
14: 4 0.17622914 group2
15: 5 0.50420764 group2
16: 6 0.08646833 group2
17: 7 0.28408027 group2
18: 8 0.20252834 group2
19: 9 0.16940959 group2
20: 10 0.60843486 group2
我尝试了
first_value = DT[, .(first_value = .I[c(1L)]), by="idx"]$first_value
DT[first_value,ChainIndex := 100]
DT[,ChainIndex := shift(ChainIndex, type="lag", n=1)*(1+growth_rate), by=idx]
并使用循环(由于我的数据集包含许多组和行,因此我想避免这种情况)
for (row in 1:nrow(DT))
{
if (row %in% first_value)
{DT[row, ChainIndex := as.numeric(100)]}
else
{DT[row, ChainIndex := shift(ChainIndex, type = "lag", n=1)*(1+growth_rate), by=idx]}
}
但是,两个过程都没有为每一行执行此索引链接。最后,每个组在第一年的ChainIndex应该为100,在所有其他年份的ChainIndex(t-1)*(1 + growth_rate)。有人可以帮我吗?
答案 0 :(得分:2)
不确定,因为您期望的输出丢失了,但这可能有用...
#set the growth-rate of the first row of each group to 0
DT[ is.na(growth_rate), growth_rate := 0 ]
#calculate the cumulative product (= growth_rate + 1 )
DT[, chain := 100 * cumprod( growth_rate + 1 ), by = .(idx) ]
#reset the first rows back to NA
DT[ growth_rate == 0, growth_rate := NA_real_ ][]
time growth_rate idx chain
1: 1 NA group1 100.0000
2: 2 0.82593921 group1 182.5939
3: 3 0.48084893 group1 270.3940
4: 4 0.65483959 group1 447.4587
5: 5 0.87944148 group1 840.9725
6: 6 0.78886104 group1 1504.3829
7: 7 0.87714854 group1 2823.9502
8: 8 0.87268452 group1 5288.3677
9: 9 0.93289483 group1 10221.8586
10: 10 0.05558125 group1 10790.0023
11: 1 NA group2 100.0000
12: 2 0.36341183 group2 136.3412
13: 3 0.21488630 group2 165.6390
14: 4 0.17622914 group2 194.8295
15: 5 0.50420764 group2 293.0640
16: 6 0.08646833 group2 318.4047
17: 7 0.28408027 group2 408.8572
18: 8 0.20252834 group2 491.6624
19: 9 0.16940959 group2 574.9547
20: 10 0.60843486 group2 924.7772
将data.table
组中每个第一个growth_rate
的值设置为idx
的另一种(更多0
方法)是:
DT[ DT[, .(.I[1L]), by=idx]$V1, growth_rate := 0][]
答案 1 :(得分:1)
cumprod
函数很容易解决这个问题。
library(data.table)
x1 <- c(NA, runif(9))
x2 <- c(NA, runif(9))
DT <- data.table(
time = rep(1:10, 2),
growth_rate = c(x1, x2),
idx = c(rep("group1",10),
rep("group2",10))
)
DT[
i = order(time),
j = `:=`(
value = 100 * cumprod(1 + ifelse(is.na(growth_rate), 0, growth_rate))),
by = idx]
print(DT)
time growth_rate idx value
1: 1 NA group1 100.0000
2: 2 0.95908608 group1 195.9086
3: 3 0.25566986 group1 245.9965
4: 4 0.55565852 group1 382.6866
5: 5 0.15934976 group1 443.6676
6: 6 0.73005207 group1 767.5681
7: 7 0.38046874 group1 1059.6037
8: 8 0.11186212 group1 1178.1333
9: 9 0.24389118 group1 1465.4696
10: 10 0.05880406 group1 1551.6452
11: 1 NA group2 100.0000
12: 2 0.39967710 group2 139.9677
13: 3 0.25459351 group2 175.6026
14: 4 0.07636151 group2 189.0119
15: 5 0.65243776 group2 312.3303
16: 6 0.37214618 group2 428.5629
17: 7 0.93790246 group2 830.5131
18: 8 0.57050829 group2 1304.3276
19: 9 0.06343531 group2 1387.0681
20: 10 0.20862719 group2 1676.4482
如果您要在time
中进行插补,则会变得更加复杂。