我有一个带有值列和相应年份的数据框。我想创建一个额外的列,其中应包含以5年为间隔的年份的价值比率,向后倒退。例如如果年份是2000,则“ newval”列应具有2000和1995年的价值比率。 我的数据框看起来像。可能缺少年份,并且“值”和“年份”列中都没有数据。
df2 = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT","AUT"),
val = c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563,56),
Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001,2002))
最终数据集应如下图所示
df2 = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT","AUT"),
val= c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563,56),
Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001,2002), newval=c(NA,NA,NA,NA,NA,0.032520325,0.547619048,1.086956522,NA,NA,NA,241.8695652,2.24))
答案 0 :(得分:5)
在基数R中,我们可以使用match
df2$new_val <- with(df2, val/val[match(Year - 5, Year)])
df2
# code val Year new_val
#1 AFG 123 1990 NA
#2 AGO 42 1991 NA
#3 ALB 23 1992 NA
#4 AND 5 1993 NA
#5 ARB 42 1991 NA
#6 ARE 4 1995 0.0325
#7 ARG 23 1996 0.5476
#8 ARM 25 1997 1.0870
#9 ASM 42 1991 NA
#10 ATG 23 1992 NA
#11 AUS NA 2000 NA
#12 AUT 5563 2001 241.8696
#13 AUT 56 2002 2.2400
答案 1 :(得分:1)
使用dplyr
软件包的一种可能性是:
df2 %>% mutate(Year = Year + 5) %>% select(-code) %>% distinct() %>%
left_join(df2, ., by = "Year", suffix = c("", "_old")) %>%
mutate(newval = val / val_old) %>%
select(-val_old)
code val Year newval
1 AFG 123 1990 NA
2 AGO 42 1991 NA
3 ALB 23 1992 NA
4 AND 5 1993 NA
5 ARB 42 1991 NA
6 ARE 4 1995 0.03252033
7 ARG 23 1996 0.54761905
8 ARM 25 1997 1.08695652
9 ASM 42 1991 NA
10 ATG 23 1992 NA
11 AUS NA 2000 NA
12 AUT 5563 2001 241.86956522
13 AUT 56 2002 2.24000000