我想知道使用dplyr
是否可以进行以下计算。
x <- data.frame(
yr = c(2012, 2013, 2014, 2015, 2016),
rate = c(1.1, 1.2, 0.8, -0.4, 0.5)
) %>% arrange(desc(yr))
这就是我想要计算的方式:
y[i] = ifelse(yr == max(yr), 100,
100 * y[i-1]/(100 + rate[i-1]))
如果我尝试这样的话:
x %>%
mutate(
y = ifelse(
yr == max(yr), 100,
100 * lag(y) / (100 + lag(rate))
)
)
它返回以下错误:Evaluation error: object 'y' not found.
正如标题中所反映的,我想在管道内部使用dplyr解决方案而不使用zoo
或data.table
之类的包,主要是因为它与不同数据库的SQL可译性。
这有可能吗?
答案 0 :(得分:3)
选项是使用accumulate
purrr
library(tidyverse)
x %>%
mutate(y = accumulate(rate[-n()],
~ 100 * .x/(100 + .y),
.init = 100))
# yr rate y
#1 2016 0.5 100.00000
#2 2015 -0.4 99.50249
#3 2014 0.8 99.90210
#4 2013 1.2 99.10922
#5 2012 1.1 97.93401
也可以base R
与Reduce
Reduce(function(u, v) 100 * u/(100 + v) , x$rate[-nrow(x)],init = 100, accumulate = TRUE)
#[1] 100.00000 99.50249 99.90210 99.10922 97.93401
根据OP的逻辑,第一个元素初始化为100
> 100 * (100)/(100 + 0.5) # 2nd element
[1] 99.50249
> 100 * 99.50249/(100 - 0.4) # 3rd element
[1] 99.9021
> 100 * 99.9021/(100 + 0.8) # 4th element
[1] 99.10923
> 100 * 99.10923/(100 + 1.2) # 5th element
[1] 97.93402
答案 1 :(得分:3)
请尝试cumprod
,如下所示:
x %>% mutate(y = 100 * cumprod(100 / (100 + lag(rate, default = 0))))
,并提供:
yr rate y
1 2016 0.5 100.00000
2 2015 -0.4 99.50249
3 2014 0.8 99.90210
4 2013 1.2 99.10922
5 2012 1.1 97.93401
关于数据库,我怀疑dplyr可以做到这一点,但你可以直接使用sql与数据库。以下是使用sqldf和sqlite后端的示例。相同的代码也适用于H2数据库后端。
library(sqldf)
sqldf("select a.yr, a.rate, 100 * coalesce(exp(sum(log(100/(100 + b.rate)))), 1) y
from x a left join x b on a.yr < b.yr group by a.yr
order by a.yr desc")
,并提供:
yr rate y
1 2016 0.5 100.00000
2 2015 -0.4 99.50249
3 2014 0.8 99.90210
4 2013 1.2 99.10922
5 2012 1.1 97.93401
答案 2 :(得分:0)
另一种选择可能是使用for
循环
library(dplyr)
#initialize column "y"
x$y <- NA
#process one row at a time
for (i in seq(nrow(x))) {
x[i,] <- (x[seq(i),] %>%
mutate(y = ifelse(yr==max(yr), 100, 100 * lag(y) / (100 + lag(rate)))))[i,]
}
x
输出是:
yr rate y
1 2016 0.5 100.00000
2 2015 -0.4 99.50249
3 2014 0.8 99.90210
4 2013 1.2 99.10922
5 2012 1.1 97.93401
示例数据:
x <- structure(list(yr = c(2016, 2015, 2014, 2013, 2012), rate = c(0.5,
-0.4, 0.8, 1.2, 1.1)), class = "data.frame", row.names = c(NA,
-5L), .Names = c("yr", "rate"))