Question

我希望能够过滤（最好是Dplyr）一些列数不确定的数据帧。这些数据帧有几列，其中一些以相同的后缀命名。因此，我想做的是过滤那些使用相同后缀命名的所有列都具有相同值的行。

所以我有以下数据框：

  Consequence CANONICAL x_LOH y_LOH x3
1            x       YES False False 12
2            x        NO False False 43
3            x       YES False False 64
4            x        NO  True False 34
5            y       YES  True False 93
6            y        NO  True False 16
7            y       YES  True  True 32
8            y        NO  True  True 74
9            z       YES False  True 84
10           z        NO False  True 89

我想过滤数据框并仅选择带有后缀（_LOH）的列为“ True”的行（注意！：在此数据框中有2列，但也许在其他数据框中有仅一，三或四列后缀，我需要代码对所有情况都有用）

所需的输出将是：

7            y       YES  True  True 32
8            y        NO  True  True 74

代码：

library(dplyr)

# Dataframe:

DF <- data.frame(Consequence = c(rep("x",4),rep("y",4),rep("z",4)),
                       CANONICAL = rep(c("YES","NO"),6),
                       x_LOH = c(rep("False", 3), rep("True", 5), rep("False",2), "True","False"),
                       y_LOH = c(rep("False", 6), rep("True",4), rep("False",2)),
                       x3=c(12,43,64,34,93,16,32,74,84,89,45,67))

# This obviously does not work

cols = names(DF)[grepl("_LOH", names(DF))]
DF %>% filter
 (for(i in 1:length(cols)){
   cols[i] == "True"
})

任何想法都会非常感激。

谢谢

Answer 1

您可以尝试：

DF %>%
 filter_at(vars(ends_with("_LOH")), all_vars(. == "True"))

  Consequence CANONICAL x_LOH y_LOH x3
1           y       YES  True  True 32
2           y        NO  True  True 74

与base R类似：

ind <- endsWith(names(DF), "_LOH")
DF[rowSums(DF[, ind] == "True") == sum(ind), ]

Answer 2

使用基数R，我们可以选择以"LOH"结尾的列，并选择所有值为"True"的行

cols <- grep("_LOH$", names(DF))
DF[rowSums(DF[cols] == "True") == length(cols), ]

#  Consequence CANONICAL x_LOH y_LOH x3
#7           y       YES  True  True 32
#8           y        NO  True  True 74

或使用apply

DF[apply(DF[cols] == "True", 1, all), ]

这里，这也将起作用，但会发出警告，指出字符值被强制为逻辑值。

DF[apply(DF[cols], 1, all), ]

Answer 3

再有一个基本的R选项：

isT <- function(x, y) x == "True" & y == "True"
subset(DF, Reduce(isT, DF[endsWith(names(DF), "_LOH")]), )

#   Consequence CANONICAL x_LOH y_LOH x3
# 7           y       YES  True  True 32
# 8           y        NO  True  True 74

如何使用R筛选具有不确定列数的数据框？

3 个答案: