生成并使用NSE(dplyr)将30个反引号年份的向量传递给函数

时间:2018-09-10 18:50:25

标签: r dplyr lapply purrr rlang

我有以下功能需要一个数据框,并用一年作为过滤依据。我想执行此函数以返回1997年至2017年每年的数据帧,然后将所有30帧存储到单个列表中。如果变量名是带反引号的数字,例如“ 2012”,“ 2013”​​等,那么这些年来我怎么过?

我知道我应该使用lapplypurrr::map函数的某种变体来获取列表,但是如何传递矢量1997:2017之类的东西,但是形式为引用的表达式?

 # Data frame, condensed just to years 1997 to 2001 for sample code. 
df <- tibble(Asset = c("048 (NC4), LY", "059-3D, LY", "059-5F, LY"), 
                     `1997` = c(1, 1, 1), `1998` = c(1, 0, 1), `1999` = c(0, 1, 1), 
                     `2000` = c(0, 0, 1), `2001` = c(1, 1, 0), CELL_ID = c(174625, 170318, 170318))

# Returns dataframe with counts for single year
f <- function (tbl, year) {
        year <- enquo(year)
        tbl %>% as_tibble() %>% group_by(CELL_ID) %>% filter(!!year == 1) %>%
          count(!!year) %>% arrange(desc(n))
        # function returns a table with counts by cell
      }

> f(df, `2001`)       
#> A tibble: 2 x 3
#> Groups:   CELL_ID [2]
#>  CELL_ID `2001`     n
#>    <dbl>  <dbl> <int>
#> 1  170318      1     1
#> 2  174625      1     1

我想要的东西,伪编码:

# I've written the purrr::map call incorrectly here, 
# but here's essentially the structure for how I want to run the       
# function across years and return a list of dataframes for every year:

df %>% map_dfc(~ f(tbl = .x, year = list(`1997`, `1998`)))
#                           ^replaced w vec, or `1997`, `1998`, ... `2017`

# Assuming I fix the above call's syntax, the function I need most:
yearVec <- generateBacktickVector(1997:2017)
df %>% map_dfc(~ f(tbl = .x, year = yearVec))

2 个答案:

答案 0 :(得分:2)

如前所述,只需使用tidy::gather将您的宽数据框重塑为长形即可,即使有范围也可以使用反引号。

library(dplyr)
library(tidyr)

long_df <- df %>% 
  gather(key="year", value="value", `1997`:`2001`) %>%
  filter(value > 0)

long_df
# # A tibble: 10 x 4
#    Asset         CELL_ID year  value
#    <chr>           <dbl> <chr> <dbl>
#  1 048 (NC4), LY 174625. 1997     1.
#  2 059-3D, LY    170318. 1997     1.
#  3 059-5F, LY    170318. 1997     1.
#  4 048 (NC4), LY 174625. 1998     1.
#  5 059-5F, LY    170318. 1998     1.
#  6 059-3D, LY    170318. 1999     1.
#  7 059-5F, LY    170318. 1999     1.
#  8 059-5F, LY    170318. 2000     1.
#  9 048 (NC4), LY 174625. 2001     1.
# 10 059-3D, LY    170318. 2001     1.

然后在 year 列上使用base::split来命名数据帧列表。

tibble_list <- split(long_df, long_df$year)

tibble_list 
# $`1997`
# # A tibble: 3 x 4
#   Asset         CELL_ID year  value
#   <chr>           <dbl> <chr> <dbl>
# 1 048 (NC4), LY 174625. 1997     1.
# 2 059-3D, LY    170318. 1997     1.
# 3 059-5F, LY    170318. 1997     1.

# $`1998`
# # A tibble: 2 x 4
#   Asset         CELL_ID year  value
#   <chr>           <dbl> <chr> <dbl>
# 1 048 (NC4), LY 174625. 1998     1.
# 2 059-5F, LY    170318. 1998     1.

# $`1999`
# # A tibble: 2 x 4
#   Asset      CELL_ID year  value
#   <chr>        <dbl> <chr> <dbl>
# 1 059-3D, LY 170318. 1999     1.
# 2 059-5F, LY 170318. 1999     1.

# $`2000`
# # A tibble: 1 x 4
#   Asset      CELL_ID year  value
#   <chr>        <dbl> <chr> <dbl>
# 1 059-5F, LY 170318. 2000     1.

# $`2001`
# # A tibble: 2 x 4
#   Asset         CELL_ID year  value
#   <chr>           <dbl> <chr> <dbl>
# 1 048 (NC4), LY 174625. 2001     1.
# 2 059-3D, LY    170318. 2001     1.

答案 1 :(得分:0)

您正在寻找的rlang功能是rlang::sym()(及其向量化版本rlang::syms()),它将字符串转换为反引号。

rlang::syms( as.character(1997:2001) ) %>% map( ~f(tbl = df, year = !!.x) )
# [[1]]
# # A tibble: 2 x 3
# # Groups:   CELL_ID [2]
#   CELL_ID `1997`     n
#     <dbl>  <dbl> <int>
# 1  170318      1     2
# 2  174625      1     1
# 
# [[2]]
# # A tibble: 2 x 3
# # Groups:   CELL_ID [2]
#   CELL_ID `1998`     n
#     <dbl>  <dbl> <int>
# 1  170318      1     1
# 2  174625      1     1
# 
# [[3]]
# # A tibble: 1 x 3
# # Groups:   CELL_ID [1]
#   CELL_ID `1999`     n
#     <dbl>  <dbl> <int>
# 1  170318      1     2
# 
# ...

请注意,map_dfc将不起作用,因为您得到的数据帧的行数不同。