使用 mutate 创建新变量,其中列有一个基于 tidy tibble 条件的变量

时间:2021-04-13 21:51:11

标签: r dplyr tidyverse

我正在尝试创建一个名为 cpi2000 的新变量,该变量将 2000 年的 cpi 值用于系列中的所有观察值(我有四个系列,因此是 group_by),以便我可以计算通货膨胀调整因子。但是,以下代码仅替换了 2000 年的值,而将其他年份保留为 NA。基本上,我希望 cpi2000 中有四个重复的数字,每个系列一个。

这是我的数据的头部:

 Groups:   series_id [1]
  year  series_id value seasonal_adj        series_name                                     cpi2000
  <chr> <chr>     <dbl> <chr>               <chr>                                             <dbl>
1 2000  CPIAUCSL   172. seasonally adjusted US city average, all items, seasonally adjusted    172.
2 2001  CPIAUCSL   177. seasonally adjusted US city average, all items, seasonally adjusted     NA 
3 2002  CPIAUCSL   180. seasonally adjusted US city average, all items, seasonally adjusted     NA 
4 2003  CPIAUCSL   184  seasonally adjusted US city average, all items, seasonally adjusted     NA 
5 2004  CPIAUCSL   189. seasonally adjusted US city average, all items, seasonally adjusted     NA 
6 2005  CPIAUCSL   195. seasonally adjusted US city average, all items, seasonally adjusted     NA 
> 
cpi_values_tidy_clean <- cpi_values_tidy %>%  
 separate(date, 
          into = c("year"), 
          sep = "-", 
          extra = "drop") %>%                                                            # separate NAM into three variables 
  group_by(series_id) %>%  
  mutate(cpi2000 = if_else(year == 2000, value, value[2000])) %>%  
  glimpse()

输出如下:

[1] 172.192      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA 172.200      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA
[36]      NA      NA      NA      NA      NA      NA      NA 165.717      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA 165.725      NA      NA      NA      NA      NA      NA
[71]      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA      NA

我认为最好的方法是使用 if_else 语句(case_when 似乎不起作用)。如果我能弄清楚如何让 if_else 语句中的第二个参数 ("value[2000]) 在 year == 2000 时也取值,这将起作用,但我无法弄清楚如何指定条件第二个说法。

最终目标是创建两个变量 cpi2000 和 cpi2019,这样我就可以创建第三个变量 cpi_adj = (cpi2019/cpi2000) 可用作通胀因素。

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

我意识到我可以在第二个条件 value[year == 2000] 中指定年份,而不是像我用 value[2000] 那样子集括号位置。子集 using 2000 产生了“NA's”,因为没有第 2000 行,而是我会使用 value[1],因为我想要第一个值。或者,按年份过滤更安全,因为它允许我指定我想要的年份。下面是我解决的代码和输出:

cpi_values_tidy_clean <- cpi_values_tidy %>%  
 separate(date, 
          into = c("year"), 
          sep = "-", 
          extra = "drop") %>%                                                            # separate NAM into three variables 
  group_by(series_id) %>%  
  mutate(cpi2000 = if_else(year == 2000, value, value[year == 2000])) %>%  
  mutate(cpi2019 = if_else(year == 2019, value, value[year == 2019])) %>%  
  glimpse()

  head(cpi_values_tidy_clean)

 year  series_id value seasonal_adj        series_name                                     cpi2000 cpi2019
  <chr> <chr>     <dbl> <chr>               <chr>                                             <dbl>   <dbl>
1 2000  CPIAUCSL   172. seasonally adjusted US city average, all items, seasonally adjusted    172.    256.
2 2001  CPIAUCSL   177. seasonally adjusted US city average, all items, seasonally adjusted    172.    256.
3 2002  CPIAUCSL   180. seasonally adjusted US city average, all items, seasonally adjusted    172.    256.
4 2003  CPIAUCSL   184  seasonally adjusted US city average, all items, seasonally adjusted    172.    256.
5 2004  CPIAUCSL   189. seasonally adjusted US city average, all items, seasonally adjusted    172.    256.
6 2005  CPIAUCSL   195. seasonally adjusted US city average, all items, seasonally adjusted    172.    256.

如果有人知道如何更优雅地执行此操作或使用 case_when,我很乐意看到它。