我试图想出一种在数据框中进行投影的方法,最好是dplyr
。
library("dplyr")
set.seed(1)
df0 <- data_frame(t = 0:5,
r = c(NA, rnorm(n = 5, mean = 1, sd = 0.1)),
P = c(100, rep(x = NA, times = 5)))
df0
# Source: local data frame [6 x 3]
#
# t r P
# (int) (dbl) (dbl)
# 1 0 NA 100
# 2 1 0.9373546 NA
# 3 2 1.0183643 NA
# 4 3 0.9164371 NA
# 5 4 1.1595281 NA
# 6 5 1.0329508 NA
我对如何以递归方式运行投影模型感到困惑......
df0 %>%
mutate(P = ifelse(test = is.na(P), yes = lag(P)*r, no = P))
# Source: local data frame [6 x 3]
#
# t r P
# (int) (dbl) (dbl)
# 1 0 NA 100.00000
# 2 1 0.9373546 93.73546
# 3 2 1.0183643 NA
# 4 3 0.9164371 NA
# 5 4 1.1595281 NA
# 6 5 1.0329508 NA
有人知道这是否可行?
我想到的是使用group_by
在多个地区执行此操作。数据框将非常大,因此优先考虑快速解决除data.frame
类型对象之外的其他问题。
到目前为止,我能想到的唯一解决方案是使用for
循环...
for(i in 1:5)
df0 <- df0 %>% mutate(P = ifelse(is.na(P), yes = lag(P)*r, no = P))
df0
# Source: local data frame [6 x 3]
#
# t r P
# (int) (dbl) (dbl)
# 1 0 NA 100.00000
# 2 1 0.9373546 93.73546
# 3 2 1.0183643 95.45685
# 4 3 0.9164371 87.48020
# 5 4 1.1595281 101.43575
# 6 5 1.0329508 104.77814
...这可能导致我的大数据集出现内存问题,并且考虑到我在R中读到的for
循环,可能不是最好的解决方案。
修改
使用purrr
对simulations非常相似的问题的一些不错的答案。写在博客post中。
答案 0 :(得分:1)
因为您提到了快速解决方案,所以可能是data.table
解决方案?
DT <- data.table(df0)
for(i in 1:nrow(DT))
set(DT, j = 3L, value = ifelse(is.na(DT$P), yes = lag(DT$P)*DT$r, no = DT$P))
DT
t r P
1: 0 NA 100.00000
2: 1 0.9373546 93.73546
3: 2 1.0183643 95.45685
4: 3 0.9164371 87.48020
5: 4 1.1595281 101.43575
6: 5 1.0329508 104.77814
速度比较......
f_dt <-
function(){
for (i in 1:nrow(DT))
set(DT, j = 3L, value = ifelse(is.na(DT[,P]), yes = lag(DT$P)*DT$r, no = DT$P))
DT
}
f_dplyr <-
function(){
for (i in 1:nrow(df0))
df0 <- mutate(df0, P = ifelse(is.na(P), yes = lag(P)*r, no = P))
df0
}
f_cumprod <-
function(){
res <- c(df0$P[1],df0$P[1]*cumprod(df0$r[-1]))
res
}
library(microbenchmark)
microbenchmark(f_dt(),f_dplyr(),f_cumprod(),times = 100)
Unit: microseconds # only 500 rows
expr min lq mean median uq max neval
f_dt() 178350.056 186226.605 192842.91784 190115.9120 195791.748 272405.911 100
f_dplyr() 307450.092 323326.566 331586.39073 328444.5255 335888.287 387716.640 100
f_cumprod() 27.798 34.213 45.11819 43.4075 52.175 75.268 100