两年的值之比

时间:2019-05-03 08:20:47

标签: r dataframe dplyr data.table tidyr

我有一个带有值列和相应年份的数据框。我想创建一个额外的列,其中应包含以5年为间隔的年份的价值比率,向后倒退。例如如果年份是2000,则“ newval”列应具有2000和1995年的价值比率。 我的数据框看起来像。可能缺少年份,并且“值”和“年份”列中都没有数据。

df2 = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT","AUT"),
            val = c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563,56), 
            Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001,2002))

最终数据集应如下图所示

 df2 = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT","AUT"),
             val= c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563,56),
             Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001,2002), newval=c(NA,NA,NA,NA,NA,0.032520325,0.547619048,1.086956522,NA,NA,NA,241.8695652,2.24))

2 个答案:

答案 0 :(得分:5)

在基数R中,我们可以使用match

df2$new_val <- with(df2, val/val[match(Year - 5, Year)])

df2
#   code  val Year  new_val
#1   AFG  123 1990       NA
#2   AGO   42 1991       NA
#3   ALB   23 1992       NA
#4   AND    5 1993       NA
#5   ARB   42 1991       NA
#6   ARE    4 1995   0.0325
#7   ARG   23 1996   0.5476
#8   ARM   25 1997   1.0870
#9   ASM   42 1991       NA
#10  ATG   23 1992       NA
#11  AUS   NA 2000       NA
#12  AUT 5563 2001 241.8696
#13  AUT   56 2002   2.2400

答案 1 :(得分:1)

使用dplyr软件包的一种可能性是:

df2 %>% mutate(Year = Year + 5) %>% select(-code) %>% distinct() %>% 
  left_join(df2, ., by = "Year", suffix = c("", "_old")) %>% 
  mutate(newval = val / val_old) %>% 
  select(-val_old)

   code  val Year       newval
1   AFG  123 1990           NA
2   AGO   42 1991           NA
3   ALB   23 1992           NA
4   AND    5 1993           NA
5   ARB   42 1991           NA
6   ARE    4 1995   0.03252033
7   ARG   23 1996   0.54761905
8   ARM   25 1997   1.08695652
9   ASM   42 1991           NA
10  ATG   23 1992           NA
11  AUS   NA 2000           NA
12  AUT 5563 2001 241.86956522
13  AUT   56 2002   2.24000000