Calculate columns from object names that are years

时间:2019-01-09 22:03:54

标签: r object dplyr

I have this data:

df <- tibble(year = c(2018L, 2019L, 2020L, 2021L, 2022L, 2023L, 2024L, 2018L, 2019L,
                           2020L, 2021L, 2022L, 2023L, 2024L),
                number = c(1000L, 2000L, 3000L, 4000L, 5000L, 6000L, 7000L, 1000L, 1100L,
                           1200L, 1300L, 1400L, 1500L, 1600L),
                  area = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b",
                           "b", "b"))

    year_a <- 2019
    year_b <- 2022

That I have transformed to this:

df2 <- df  %>% 
  filter(year %in% c(year_a, year_b)) %>% 
  spread(year, number) 

Which is this:

# A tibble: 2 x 3
  area  `2019` `2022`
  <chr>  <int>  <int>
1 a       2000   5000
2 b       1100   1400

I want to be able to calculate the difference between year_b (2022) and year_a (2019) without having to type those numbers in as the years will change and are referred to in the objects.

I have tried this:

year_a_chr <- paste0("`", year_a, "`", sep = "")
year_b_chr <- paste0("`", year_b, "`", sep = "")

df2 %>% 
  mutate(growth = !!year_b_chr - !!year_a_chr)

Which gives me this error:

Error in mutate_impl(.data, dots) : Evaluation error: invalid argument type.

How would I solve this? Thanks.

2 个答案:

答案 0 :(得分:2)

We convert the objects to symbol and evaluate (!!)

df2 %>% 
    mutate(growth = !! rlang::sym(paste0(year_b)) - !! rlang::sym(paste0(year_a)))
# A tibble: 2 x 4
#  area  `2019` `2022` growth
#  <chr>  <int>  <int>  <int>
#1 a       2000   5000   3000
#2 b       1100   1400    300

答案 1 :(得分:1)

If you don't mind a small extra step, you can make a lookup table of years with their labels (i.e. "year_a"). That way, you can join to your table, then use those labels for column names after you spread the data. It also may scale more easily, such as if you need to expand the set of years you're working with.


year_a <- 2019
year_b <- 2022

You can make a lookup table manually, like so:

# manually
  key = c("year_a", "year_b"),
  year = c(year_a, year_b)
#> # A tibble: 2 x 2
#>   key     year
#>   <chr>  <dbl>
#> 1 year_a  2019
#> 2 year_b  2022

Or programmatically, using tibble::lst as a quick way to include the names of your year variables.

year_lookup <- lst(year_a, year_b) %>% 
  as_tibble() %>% 
  gather(key, value = year)

Then instead of filtering, use an inner join to keep just the values that appear in the lookup, and get their labels.

df2 <- df %>%
  inner_join(year_lookup, by = "year") %>%
  select(-year) %>%
  spread(key = key, value = number)

#> # A tibble: 2 x 3
#>   area  year_a year_b
#>   <chr>  <int>  <int>
#> 1 a       2000   5000
#> 2 b       1100   1400

After that, you have a way to do your calculations by pointing to columns like year_b instead of the year number.

df2 %>%
  mutate(diff = year_b - year_a)
#> # A tibble: 2 x 4
#>   area  year_a year_b  diff
#>   <chr>  <int>  <int> <int>
#> 1 a       2000   5000  3000
#> 2 b       1100   1400   300

Created on 2019-01-09 by the reprex package (v0.2.1)