Question

我正在尝试有条件地过滤数据框以提取感兴趣的行。我要尝试的操作与通用条件过滤的不同之处在于，它涉及影响列对的可变规则。

我下面的reprex模拟了一个data.frame，它涉及4个样本：Control，Drug_1，Drug_2和Drug_3以及它们之间的成对比较（差异为显示为p_value）。我想在函数中使用这段代码来比较四个以上的组。我尝试将过滤条件与OR运算符组合在一起，但最后得到了一个难看的代码。

我的最终目标是获得一个filtered_df，其中显示了变量group1和group2具有我的comparisons列表中的数据对的所有行。任何帮助表示赞赏！

最好，阿塔坎

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Make a mock data frame
gene <- "ABCD1"
group1 <- c("Control", "Control", "Control", "Drug_1", "Drug_1", "Drug_2")
group2 <- c("Drug_1", "Drug_2", "Drug_3", "Drug_2", "Drug_3", "Drug_3")
p_value <- c(0.4, 0.001, 0.003, 0.01, 0.3, 0.9)

df <- data.frame(gene, group1, group2, p_value)
df
#>    gene  group1 group2 p_value
#> 1 ABCD1 Control Drug_1   0.400
#> 2 ABCD1 Control Drug_2   0.001
#> 3 ABCD1 Control Drug_3   0.003
#> 4 ABCD1  Drug_1 Drug_2   0.010
#> 5 ABCD1  Drug_1 Drug_3   0.300
#> 6 ABCD1  Drug_2 Drug_3   0.900

# I'd like to filter rows when group1 and group2 matches the following pairs
comparisons <- list(c("Control", "Drug_1"), c("Control", "Drug_2"), c("Drug_2", "Drug_3"))


# I can filter by using one pair as follows:
filtered_df <- df %>%
  filter(group1 == comparisons[[1]][1] & group2 == comparisons[[1]][2])

filtered_df
#>    gene  group1 group2 p_value
#> 1 ABCD1 Control Drug_1     0.4

由reprex package（v0.2.0）于2018-06-29创建。

Answer 1

我们可以通过两种方式来做到这一点。

1）一种方法是遍历list（“比较”），然后对单个数据集进行filter并将输出绑定在一起（{{ 1}}）

map_df

2）另一个选择是将library(tidyverse) map_df(comparisons, ~ df %>% filter(group1 == .x[1] & group2 == .x[2]))转换为list并使用第一个数据集进行data.frame

inner_join

3）或使用do.call(rbind, comparisons) %>% # rbind to a matrix as.data.frame %>% # convert to a data.frame set_names(c("group1", "group2")) %>% # change the column names inner_join(df) # and inner join中的merge（类似于2）

base R

通过使用涉及两个列的两个单独的选择条件来使用dplyr进行过滤

1 个答案: