我有一个大型数据框,其中包含超过100个条件的布尔列(这不是理想的设置,但是我无法更改)。我正在尝试制作一个函数,该函数接受可变数量的条件列,然后过滤所有条件为1或全部为零的过滤器。
设置
library(dplyr)
set.seed(123)
ID <- sample(1:5, 20, replace = TRUE)
Val <- round(runif(length(ID), 20, 40),0)
cond_1 <- sample(0:1, length(ID), replace = TRUE)
cond_2 <- sample(0:1, length(ID), replace = TRUE)
cond_3 <- sample(0:1, length(ID), replace = TRUE)
cond_4 <- sample(0:1, length(ID), replace = TRUE)
df <- data.frame(ID, Val, cond_1, cond_2, cond_3, cond_4, stringsAsFactors = FALSE)
任意两列所需功能的示例:
filterTwoCols <- function(df, cols){
# Select desired conditions
df1 <- df %>%
select(ID, Val, one_of(cols))
#### Filter on all conditions == 0 or all conditions == 1
df2 <- df1 %>%
filter(.[,ncol(.)] == 1 & .[,ncol(.) - 1] == 1 |
.[,ncol(.)] == 0 & .[,ncol(.) - 1] == 0)
return(df2)
}
filterTwoCols(df, c('cond_1', 'cond_4'))
filterTwoCols(df, c('cond_3', 'cond_2'))
我想做的是命名任意数量的条件(例如filterManyCols(df, c('cond_1', 'cond_3', 'cond_4'))
,但是我不知道如何在不明确地在过滤器中命名它们的情况下.[,ncol(.) - 2] == 1
, .[,ncol(.) - 3] == 1
等)。如果所选的列数与过滤器中的条件数不匹配,那么它将不起作用。有什么想法吗?
答案 0 :(得分:2)
一个选项是filter_at
library(tidyverse)
filterManyCols <- function(df, cols){
# Select desired conditions
# Not clear whether we need to subset the columns or get the filtered
# full dataset columns
# df <- df %>%
# select(ID, Val, one_of(cols))
map_df(0:1, ~ df %>%
filter_at(vars(one_of(cols)), all_vars(. == .x)))
}
filterManyCols(df, c('cond_1', 'cond_4'))
filterManyCols(df, c('cond_1', 'cond_2', 'cond_3'))
filterManyCols(df, c('cond_1', 'cond_2', 'cond_3', 'cond_4'))