Question

我的数据集如下所示。

Proc1   Proc2   Proc3  Count
AAZ      BLA     C       5
D        AAZ     E       7
A        G       F       1
T        X       Y       10

我有另一个矢量，如下所示。

Procs <- c("A", "B")

我希望在前3列中的任何一列中过滤包含A和B的行。我想要的输出如下。

Proc1   Proc2   Proc3   Count
AAZ     BLA       C       5

请告诉我是否有一个很好的方法来实现这一目标。尝试在apply函数中使用％like％但无法获得所需的结果。

Answer 1

以下是使用sapply和rowSums以及grep的方法。两次单独调用grep检查是否存在＆＃34; A＆＃34;和＆＃34; B＆＃34;。 sapply对整个data.frame执行这些检查并返回矩阵。 rowSums按行对这些逻辑矩阵求和。结果成倍增加，如果其中任何一个＆＃34; A＆＃34;或＆＃34; B＆＃34;连续丢失，返回零。最后，检查翻新是否大于0。

keepers <- rowSums(sapply(df[1:3], function(x) grepl("A", x))) * 
           rowSums(sapply(df[1:3], function(x) grepl("B", x))) > 0

df[keepers,]
  Proc1 Proc2 Proc3 Count
1   AAZ   BLA     C     5

尽管让它变得更加动态，但这是可能的。您可以将rowSums函数包含在sapply中，并将sapply模式向量输入。这将返回rowSums矩阵。然后，您可以使用apply在每行上应用prod函数，然后检查肯定的实例。

keepers <- apply(sapply(c("A", "B"),
                        function(i) rowSums(sapply(df[1:3], function(x) grepl(i, x)))),
                 1, prod) > 0

keepers
[1]  TRUE FALSE FALSE FALSE

Answer 2

我们遍历'Proc'列，检查元素是否同时包含'A'和'B'以将list逻辑vector，Reduce返回到单个vector通过比较符合条件的行中任何元素的vector s的相应元素，并使用它来对数据集行进行子集化。

pat <- paste(paste(Procs, collapse=".*"), paste(rev(Procs), collapse=".*"), sep="|")
df1[Reduce(`|`, lapply(df1[grep("Proc", names(df1))], grepl, pattern = pat)),]
#  Proc1 Proc2 Proc3 Count
#1   AAZ   BLA     C     5

或另一种选择是将paste行中的元素组合在一起并执行单个grep

pat <- paste(paste(Procs, collapse="[^,]*"), paste(rev(Procs), collapse="[^,]*"), sep="|")
df1[grep(pat, do.call(paste, c(df1[grep("Proc", names(df1))], sep=","))),]
#  Proc1 Proc2 Proc3 Count
#1   AAZ   BLA     C     5

数据

Procs <- c("A", "B")

Answer 3

Procs <- c("A", "B")

# unite all the columns you are interested to search in. Thanks to @DavidArenburg for the improvements
xxx = do.call(paste0, df[1:3])
#> xxx
#[1] "AAZBLAC" "DAAZE"   "AGF"     "TXY"   

# now iterate through the above vector and apply grepl, if the totalSum matches the 
# length of Procs - it means all characters in the Procs were present in the value of xxx

ind <- which(rowSums(sapply(Procs, grepl, xxx, fixed = TRUE)) == length(Procs))
df[ind,]
#   Proc1 Proc2 Proc3 Count
#1:   AAZ   BLA     C     5

检查每行的前三列是否有子串

3 个答案:

数据