R根据应用于多个列

时间:2017-09-14 09:41:11

标签: r filter dplyr grepl

数据集示例:

diag01 <- as.factor(c("S7211","J47","J47","K729","M2445","Z509","Z488","R13","L893","N318","L0311","S510","A047","D649"))
diag02 <- as.factor(c("K590","D761","J961","T501","M8580","R268","T831","G8240","B9688","G550","E162","T8902","E86","I849"))
diag03 <- as.factor(c("F058","M0820","E877","E86","G712","R32","A408","E888","G8220","C794","T68","L0310","M1094","D469"))
diag04 <- as.factor(c("E86","C845","R790","I420","G4732","R600","L893","R509","T913","C795","M8412","G8212","L891","L0311"))
diag05 <- as.factor(c("R001","N289","E876","E871","H659","R4589","N508","B99","I209","C773","T921","Q070","H919","L033"))
diag06 <- as.factor(c("I951","E877","S7240","I500","H901","E119","Z223","K590","I959","C509","G819","F719","Z290","R13"))

df <- data.frame(diag01, diag02, diag03, diag04, diag05, diag06)

我想过滤在给定列列中的任何位置具有部分字符串匹配的整个行(例如diag01,diag02,...)。我可以在一个列上实现这一点,例如

junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", diag02))

但我需要将其应用于多个列(原始数据集有216列和> 1,000,000行)。在其他选项中,我尝试了

junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", df[,c(1:6)]))
junk <- apply(df, 1, function(r) any(r %in% grepl(pattern="^E11|^E16|^E86|^E87|^E88")))

我需要整行,理想情况下我希望将过滤条件限制在给定的列列表中,因为其他列中的值可能以声明的部分字符串开头。

真正努力寻找解决方案,但显然我对R的了解不足。

1 个答案:

答案 0 :(得分:4)

也许我们需要

df %>%
   filter_all(any_vars(grepl(pattern="^(E11|E16|E86|E87|E88)", .)))

purrrdplyr

library(dplyr)
library(purrr)
df %>%
   map(~grepl(pattern="^E11|^E16|^E86|^E87|^E88", .)) %>% 
   reduce(`|`) %>%
   df[.,]