我正在尝试使用dplyr按组数据进行处理,但它无法正常工作。任何帮助,将不胜感激。以下是数据样本。我想保留2014年的值,并使用滞后(midfs1)和值计算midfs1的其余值。以下是我对这个问题的尝试。
t3 = t2 %>%
group_by(cz,btype) %>%
mutate( midfs1 = ifelse(year == 2014,midfs1,
lag(midfs1)*value+lag(midfs1)))
t2数据:
cz btype year midfs value midfs1
1 College 2014 5.4254 0.007582767 5.4254
1 College 2015 5.4779 0.007582767 NA
1 College 2016 5.5191 0.007582767 NA
1 College 2017 5.5616 0.007582767 NA
1 College 2018 5.6097 0.007582767 NA
1 Grocery 2012 4.8267 0.002697526 NA
1 Grocery 2013 4.8205 0.002697526 NA
1 Grocery 2014 4.8583 0.002697526 4.8583
1 Grocery 2015 4.8966 0.002697526 NA
1 Grocery 2016 4.9556 0.002697526 NA
1 Grocery 2017 5.0258 0.002697526 NA
1 Grocery 2018 5.0982 0.002697526 NA
1 Grocery 2019 5.1514 0.002697526 NA
1 Grocery 2020 5.1976 0.002697526 NA
1 Grocery 2021 5.2338 0.002697526 NA
答案 0 :(得分:0)
解决复合增长问题:
t3 <-
t2 %>%
group_by(cz, btype) %>%
filter(year >= 2014) %>%
mutate(my_n = 1:n(),
midfs2 = ifelse(year == 2014,
midfs1,
rep(midfs1[1]) * (1 + value) ^ lag(my_n, 1)))
# result
Source: local data frame [13 x 8]
Groups: cz, btype
cz btype year midfs value midfs1 my_n midfs2
1 1 College 2014 5.4254 0.007582767 5.4254 1 5.425400
2 1 College 2015 5.4779 0.007582767 NA 2 5.466540
3 1 College 2016 5.5191 0.007582767 NA 3 5.507991
4 1 College 2017 5.5616 0.007582767 NA 4 5.549757
5 1 College 2018 5.6097 0.007582767 NA 5 5.591839
6 1 Grocery 2014 4.8583 0.002697526 4.8583 1 4.858300
7 1 Grocery 2015 4.8966 0.002697526 NA 2 4.871405
8 1 Grocery 2016 4.9556 0.002697526 NA 3 4.884546
9 1 Grocery 2017 5.0258 0.002697526 NA 4 4.897722
10 1 Grocery 2018 5.0982 0.002697526 NA 5 4.910934
11 1 Grocery 2019 5.1514 0.002697526 NA 6 4.924181
12 1 Grocery 2020 5.1976 0.002697526 NA 7 4.937465
13 1 Grocery 2021 5.2338 0.002697526 NA 8 4.950783