我有以下功能需要一个数据框,并用一年作为过滤依据。我想执行此函数以返回1997年至2017年每年的数据帧,然后将所有30帧存储到单个列表中。如果变量名是带反引号的数字,例如“ 2012”,“ 2013”等,那么这些年来我怎么过?
我知道我应该使用lapply
或purrr::map
函数的某种变体来获取列表,但是如何传递矢量1997:2017
之类的东西,但是形式为引用的表达式?
# Data frame, condensed just to years 1997 to 2001 for sample code.
df <- tibble(Asset = c("048 (NC4), LY", "059-3D, LY", "059-5F, LY"),
`1997` = c(1, 1, 1), `1998` = c(1, 0, 1), `1999` = c(0, 1, 1),
`2000` = c(0, 0, 1), `2001` = c(1, 1, 0), CELL_ID = c(174625, 170318, 170318))
# Returns dataframe with counts for single year
f <- function (tbl, year) {
year <- enquo(year)
tbl %>% as_tibble() %>% group_by(CELL_ID) %>% filter(!!year == 1) %>%
count(!!year) %>% arrange(desc(n))
# function returns a table with counts by cell
}
> f(df, `2001`)
#> A tibble: 2 x 3
#> Groups: CELL_ID [2]
#> CELL_ID `2001` n
#> <dbl> <dbl> <int>
#> 1 170318 1 1
#> 2 174625 1 1
我想要的东西,伪编码:
# I've written the purrr::map call incorrectly here,
# but here's essentially the structure for how I want to run the
# function across years and return a list of dataframes for every year:
df %>% map_dfc(~ f(tbl = .x, year = list(`1997`, `1998`)))
# ^replaced w vec, or `1997`, `1998`, ... `2017`
# Assuming I fix the above call's syntax, the function I need most:
yearVec <- generateBacktickVector(1997:2017)
df %>% map_dfc(~ f(tbl = .x, year = yearVec))
答案 0 :(得分:2)
如前所述,只需使用tidy::gather
将您的宽数据框重塑为长形即可,即使有范围也可以使用反引号。
library(dplyr)
library(tidyr)
long_df <- df %>%
gather(key="year", value="value", `1997`:`2001`) %>%
filter(value > 0)
long_df
# # A tibble: 10 x 4
# Asset CELL_ID year value
# <chr> <dbl> <chr> <dbl>
# 1 048 (NC4), LY 174625. 1997 1.
# 2 059-3D, LY 170318. 1997 1.
# 3 059-5F, LY 170318. 1997 1.
# 4 048 (NC4), LY 174625. 1998 1.
# 5 059-5F, LY 170318. 1998 1.
# 6 059-3D, LY 170318. 1999 1.
# 7 059-5F, LY 170318. 1999 1.
# 8 059-5F, LY 170318. 2000 1.
# 9 048 (NC4), LY 174625. 2001 1.
# 10 059-3D, LY 170318. 2001 1.
然后在 year 列上使用base::split
来命名数据帧列表。
tibble_list <- split(long_df, long_df$year)
tibble_list
# $`1997`
# # A tibble: 3 x 4
# Asset CELL_ID year value
# <chr> <dbl> <chr> <dbl>
# 1 048 (NC4), LY 174625. 1997 1.
# 2 059-3D, LY 170318. 1997 1.
# 3 059-5F, LY 170318. 1997 1.
# $`1998`
# # A tibble: 2 x 4
# Asset CELL_ID year value
# <chr> <dbl> <chr> <dbl>
# 1 048 (NC4), LY 174625. 1998 1.
# 2 059-5F, LY 170318. 1998 1.
# $`1999`
# # A tibble: 2 x 4
# Asset CELL_ID year value
# <chr> <dbl> <chr> <dbl>
# 1 059-3D, LY 170318. 1999 1.
# 2 059-5F, LY 170318. 1999 1.
# $`2000`
# # A tibble: 1 x 4
# Asset CELL_ID year value
# <chr> <dbl> <chr> <dbl>
# 1 059-5F, LY 170318. 2000 1.
# $`2001`
# # A tibble: 2 x 4
# Asset CELL_ID year value
# <chr> <dbl> <chr> <dbl>
# 1 048 (NC4), LY 174625. 2001 1.
# 2 059-3D, LY 170318. 2001 1.
答案 1 :(得分:0)
您正在寻找的rlang
功能是rlang::sym()
(及其向量化版本rlang::syms()
),它将字符串转换为反引号。
rlang::syms( as.character(1997:2001) ) %>% map( ~f(tbl = df, year = !!.x) )
# [[1]]
# # A tibble: 2 x 3
# # Groups: CELL_ID [2]
# CELL_ID `1997` n
# <dbl> <dbl> <int>
# 1 170318 1 2
# 2 174625 1 1
#
# [[2]]
# # A tibble: 2 x 3
# # Groups: CELL_ID [2]
# CELL_ID `1998` n
# <dbl> <dbl> <int>
# 1 170318 1 1
# 2 174625 1 1
#
# [[3]]
# # A tibble: 1 x 3
# # Groups: CELL_ID [1]
# CELL_ID `1999` n
# <dbl> <dbl> <int>
# 1 170318 1 2
#
# ...
请注意,map_dfc
将不起作用,因为您得到的数据帧的行数不同。