I have this data:
library(tidyverse)
df <- tibble(year = c(2018L, 2019L, 2020L, 2021L, 2022L, 2023L, 2024L, 2018L, 2019L,
2020L, 2021L, 2022L, 2023L, 2024L),
number = c(1000L, 2000L, 3000L, 4000L, 5000L, 6000L, 7000L, 1000L, 1100L,
1200L, 1300L, 1400L, 1500L, 1600L),
area = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b",
"b", "b"))
year_a <- 2019
year_b <- 2022
That I have transformed to this:
df2 <- df %>%
filter(year %in% c(year_a, year_b)) %>%
spread(year, number)
Which is this:
# A tibble: 2 x 3
area `2019` `2022`
<chr> <int> <int>
1 a 2000 5000
2 b 1100 1400
I want to be able to calculate the difference between year_b (2022) and year_a (2019) without having to type those numbers in as the years will change and are referred to in the objects.
I have tried this:
year_a_chr <- paste0("`", year_a, "`", sep = "")
year_b_chr <- paste0("`", year_b, "`", sep = "")
df2 %>%
mutate(growth = !!year_b_chr - !!year_a_chr)
Which gives me this error:
Error in mutate_impl(.data, dots) : Evaluation error: invalid argument type.
How would I solve this? Thanks.
答案 0 :(得分:2)
We convert the objects to symbol and evaluate (!!
)
library(dplyr)
df2 %>%
mutate(growth = !! rlang::sym(paste0(year_b)) - !! rlang::sym(paste0(year_a)))
# A tibble: 2 x 4
# area `2019` `2022` growth
# <chr> <int> <int> <int>
#1 a 2000 5000 3000
#2 b 1100 1400 300
答案 1 :(得分:1)
If you don't mind a small extra step, you can make a lookup table of years with their labels (i.e. "year_a"). That way, you can join to your table, then use those labels for column names after you spread the data. It also may scale more easily, such as if you need to expand the set of years you're working with.
library(tidyverse)
year_a <- 2019
year_b <- 2022
You can make a lookup table manually, like so:
# manually
tibble(
key = c("year_a", "year_b"),
year = c(year_a, year_b)
)
#> # A tibble: 2 x 2
#> key year
#> <chr> <dbl>
#> 1 year_a 2019
#> 2 year_b 2022
Or programmatically, using tibble::lst
as a quick way to include the names of your year variables.
year_lookup <- lst(year_a, year_b) %>%
as_tibble() %>%
gather(key, value = year)
Then instead of filtering, use an inner join to keep just the values that appear in the lookup, and get their labels.
df2 <- df %>%
inner_join(year_lookup, by = "year") %>%
select(-year) %>%
spread(key = key, value = number)
df2
#> # A tibble: 2 x 3
#> area year_a year_b
#> <chr> <int> <int>
#> 1 a 2000 5000
#> 2 b 1100 1400
After that, you have a way to do your calculations by pointing to columns like year_b
instead of the year number.
df2 %>%
mutate(diff = year_b - year_a)
#> # A tibble: 2 x 4
#> area year_a year_b diff
#> <chr> <int> <int> <int>
#> 1 a 2000 5000 3000
#> 2 b 1100 1400 300
Created on 2019-01-09 by the reprex package (v0.2.1)