我有一个看起来像这样的数据集:
dat <- structure(list(year = c(2003, 2004, 2005, 2006, 2007, 2008, 2009,
2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017), CD = c(246.74,
271.25, 295.21, 307.46, 405.82, 391.65, 439.1, 538.39, 549.27,
559.94, 510.51, 516.14, 480.25, 472.18, 460.56), Growth = c(1.17,
0.94, 1.05, 0.95, 1, 1.04, 1.09, 1.08, 1, 1.08, 0.97, 0.99, 1.06,
0.99, 0.99)), .Names = c("year", "CD", "Growth"), class = "data.frame", row.names = c(NA,
-15L))
看起来像
year CD Growth
16 2003 246.74 1.17
17 2004 271.25 0.94
18 2005 295.21 1.05
19 2006 307.46 0.95
20 2007 405.82 1.00
21 2008 391.65 1.04
22 2009 439.10 1.09
23 2010 538.39 1.08
24 2011 549.27 1.00
25 2012 559.94 1.08
26 2013 510.51 0.97
27 2014 516.14 0.99
28 2015 480.25 1.06
29 2016 472.18 0.99
30 2017 460.56 0.99
我需要做的是创建一个新列,调用为KD
,它具有以下值:
对于2007年,CD
在2007年之后的所有年份,KD of the year before * Growth of the current year
在2007年之前的所有年份,KD of the following year / Growth of the current year
换句话说,2007年为参考年,KD[year == 2007]
应为 405.82 ,KD[year == 2008]
应为 422.05 ({{1 }})和405.82 * 1.04
应该是 460.04 (KD[year == 2009]
)
同时,422.05 * 1.09
应该是 427.18 (KD[year == 2006]
),而405.82 / 0.95
是 406.84 (KD[year == 2005]
)< / p>
是否有一种简单的方法可以在R中执行此操作而不使用繁琐的for循环?
答案 0 :(得分:1)
我们可以做这样的事情:
library(dplyr)
df %>%
mutate(KD_ref = CD[year == 2007],
Growth_cumdiv = c(rev(cumprod(rev(1/Growth[year < 2007]))),
rep(NA, sum(year >= 2007))),
Growth_cumprod = c(rep(NA, sum(year <= 2007)),
cumprod(Growth[year > 2007])),
KD = case_when(
year < 2007 ~ KD_ref*Growth_cumdiv
year == 2007 ~ KD_ref,
year > 2007 ~ KD_ref*Growth_cumprod,
))
结果:
year CD Growth KD_ref Growth_cumdiv Growth_cumprod KD
1 2003 246.74 1.17 405.82 0.9115351 NA 369.9192
2 2004 271.25 0.94 405.82 1.0664960 NA 432.8054
3 2005 295.21 1.05 405.82 1.0025063 NA 406.8371
4 2006 307.46 0.95 405.82 1.0526316 NA 427.1789
5 2007 405.82 1.00 405.82 NA NA 405.8200
6 2008 391.65 1.04 405.82 NA 1.040000 422.0528
7 2009 439.10 1.09 405.82 NA 1.133600 460.0376
8 2010 538.39 1.08 405.82 NA 1.224288 496.8406
9 2011 549.27 1.00 405.82 NA 1.224288 496.8406
10 2012 559.94 1.08 405.82 NA 1.322231 536.5878
11 2013 510.51 0.97 405.82 NA 1.282564 520.4902
12 2014 516.14 0.99 405.82 NA 1.269738 515.2853
13 2015 480.25 1.06 405.82 NA 1.345923 546.2024
14 2016 472.18 0.99 405.82 NA 1.332464 540.7404
15 2017 460.56 0.99 405.82 NA 1.319139 535.3330
还可以使其具有功能:
library(dplyr)
library(rlang)
KD_calc <- function(DF, ref_year, KD_colname){
KD_colname_quo = quo_name(enquo(KD_colname))
DF %>%
mutate(KD_ref = CD[year == ref_year],
Growth_cumdiv = c(rev(cumprod(rev(1/Growth[year < ref_year]))),
rep(NA, sum(year >= ref_year))),
Growth_cumprod = c(rep(NA, sum(year <= ref_year)),
cumprod(Growth[year > ref_year])),
UQ(KD_colname_quo) := case_when(
year < ref_year ~ KD_ref*Growth_cumdiv,
year == ref_year ~ KD_ref,
year > ref_year ~ KD_ref*Growth_cumprod,
)) %>%
select(-KD_ref, -Growth_cumdiv, -Growth_cumprod)
}
结果:
> KD_calc(df, 2007, KD)
year CD Growth KD
1 2003 246.74 1.17 369.9192
2 2004 271.25 0.94 432.8054
3 2005 295.21 1.05 406.8371
4 2006 307.46 0.95 427.1789
5 2007 405.82 1.00 405.8200
6 2008 391.65 1.04 422.0528
7 2009 439.10 1.09 460.0376
8 2010 538.39 1.08 496.8406
9 2011 549.27 1.00 496.8406
10 2012 559.94 1.08 536.5878
11 2013 510.51 0.97 520.4902
12 2014 516.14 0.99 515.2853
13 2015 480.25 1.06 546.2024
14 2016 472.18 0.99 540.7404
15 2017 460.56 0.99 535.3330
答案 1 :(得分:1)
dat%>%mutate(l=CD[year==2007])%>%
group_by(s=cumsum(year==2007))%>%
mutate(KD=ifelse(s==0,l/rev(cumprod(rev(Growth))),l*cumprod(Growth)),l=NULL)%>%
data.frame()
year CD Growth s KD
1 2003 246.74 1.17 0 369.9192
2 2004 271.25 0.94 0 432.8054
3 2005 295.21 1.05 0 406.8371
4 2006 307.46 0.95 0 427.1789
5 2007 405.82 1.00 1 405.8200
6 2008 391.65 1.04 1 422.0528
7 2009 439.10 1.09 1 460.0376
8 2010 538.39 1.08 1 496.8406
9 2011 549.27 1.00 1 496.8406
10 2012 559.94 1.08 1 536.5878
11 2013 510.51 0.97 1 520.4902
12 2014 516.14 0.99 1 515.2853
13 2015 480.25 1.06 1 546.2024
14 2016 472.18 0.99 1 540.7404
15 2017 460.56 0.99 1 535.3330