我有一个包含2个数据列和两个id列的数据表。 id列是带有值(X2010,X2015,X2020等)和国家(cty1,cty2等)的年份。对于每个国家/地区,第一组数据列(f1,f2,f3等)仅在第一行(X2010)中具有值,在其余行中具有NA。第二组列(x.f1,x.f2,x.f3等)在第一行中具有NA,在其余行中具有不同的值。我想用第一组列中的NA替换每个国家的以下递归结构。
f1.X2015 = f1.X2010 * x.f1.X2015
f1.X2020 = f1.X2015 * x.f1.X2020
...
我试过以下
foods <- c("f1", "f2", "f3")
x.foods <- c("x.f1", "x.f"2, "x.f3")
res <- c("res.f1", "res.f2", "res.f3")
f.cumprod <- function(x,y) {return(first(x) * cumprod(replace(y), 1,1) * NA^(.I= 1))}
这是我认为可以为res列生成值的数据表结构。
DT[,(res) := mapply(FUN = f.cumprod, x = .SD, y = list(x.foods)), .SDcols = foods, by = c("cty")]
这是一个国家/地区的简化版
set.seed(24)
dt <- data.table(cty = c(rep("cty1", 5), rep("cty2", 5), rep("cty3", 5)), year = rep(c("X2010", "X2015", "X2020", "X2025", "X2030"), 3),
f1 = rep(c(0.9883415, rep(NA, 4)), 3), f2 = rep(c(1.0685221, rep(NA, 4)), 3), f3 = rep(c(1.0664189, rep(NA, 4)), 3),
x.f1 = rep(c(NA, rep(rnorm(4))), 3), x.f2 = rep(c(NA, rep(rnorm(4))), 3), x.f3 = rep(c(NA, rep(rnorm(4))), 3))
一种kludgy,slowwww,获得食物之一的结果,f1。
dt.subset <- dt[, c("f1", "x.f1"), with = FALSE]
for (i in 2:nrow(dt.subset)) {
dt.subset$f1[i] <- dt.subset$f1[i - 1] * dt.subset$x.f1[i]
}
由于我想为大约170个国家和20个食品项目(以及4个方案)这样做,我希望有一个解决方案与上面的DT代码一致。
答案 0 :(得分:2)
如果我们正在寻找一个递归函数(对于单个&#39; cty&#39;)
dt.subset[, f1 := Reduce(`*`, x.f1[-1], init = f1[1], accumulate = TRUE)]
或accumulate
purrr
library(purrr)
dt.subset[, f1 := accumulate(x.f1[-1], ~ .x * .y, .init = f1[1])]
根据OP的数据&#39; dt&#39;我们可以melt
进入&#39; long&#39;形成,然后将功能与accumulate
,dcast
一起应用到&#39;范围
out <- dcast(melt(dt, measure = patterns("^f\\d+", "^x\\.f\\d+"))[,
accumulate(value2[-1], ~ .x * .y, .init = value1[1]), .(variable, cty)],
cty + rowid(variable) ~ variable, value.var = "V1")
nm1 <- grep("^f\\d+$", names(dt), value = TRUE)
setnames(out, -(1:2), nm1)
然后set
感兴趣的列与新值
for(j in nm1) set(dt, i= NULL, j= j, value = out[[j]])
dt
# cty year f1 f2 f3 x.f1 x.f2 x.f3
# 1: cty1 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 2: cty1 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 4: cty1 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
# 5: cty1 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
# 6: cty2 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 7: cty2 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 9: cty2 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#10: cty2 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
#11: cty3 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
#12: cty3 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
#14: cty3 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#15: cty3 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
- 检查&#39; dt.subset&#39;的值对于第一个&#39; cty&#39;在应用OP的功能
之后dt.subset
# f1 x.f1
#1: 0.98834150 NA
#2: -0.53951661 -0.5458808
#3: -0.28949668 0.5365853
#4: -0.12147951 0.4196231
#5: 0.07089875 -0.5836272
或者我们可以使用Map
dt[, (foods) := Map(function(x, y) accumulate(y[-1], `*`, .init = x[1]),
mget(foods), mget(x.foods)), by = .(cty)]
dt
# cty year f1 f2 f3 x.f1 x.f2 x.f3
# 1: cty1 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 2: cty1 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 4: cty1 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
# 5: cty1 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
# 6: cty2 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 7: cty2 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 9: cty2 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#10: cty2 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
#11: cty3 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
#12: cty3 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
#14: cty3 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#15: cty3 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
或者,如果我们使用cumprod
(OP f.cumprod
函数中存在一些错误)。它可以改为
f.cumprod <- function(x, y) cumprod(c(x[1], y[-1]))
dt[, (foods) := Map(f.cumprod, mget(foods), mget(x.foods)), by = .(cty)]
dt
# cty year f1 f2 f3 x.f1 x.f2 x.f3
# 1: cty1 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 2: cty1 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 3: cty1 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 4: cty1 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
# 5: cty1 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
# 6: cty2 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
# 7: cty2 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
# 8: cty2 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
# 9: cty2 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#10: cty2 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
#11: cty3 X2010 0.98834150 1.0685221 1.066418900 NA NA NA
#12: cty3 X2015 -0.53951661 0.9055298 -0.904717849 -0.5458808 0.8474600 -0.848370044
#13: cty3 X2020 -0.28949668 0.2408908 -0.002091656 0.5365853 0.2660220 0.002311942
#14: cty3 X2025 -0.12147951 0.1070965 0.002754518 0.4196231 0.4445853 -1.316908124
#15: cty3 X2030 0.07089875 -0.0499600 0.001647943 -0.5836272 -0.4664951 0.598269113
注意:每个&#39;&#39;的值都相同。因为每个&#39;&#39;
的示例数据集值相同