Question

Index   odx1    odx2    odx3    odx4    odx5
1       123     0       0       0       0
2       0       321     0       0       0
3       0       0       0       123     0
4       0       321     0       0       0
5       0       0       0       0       0

我已附上上面的数据集示例。我想在R中的多个列中进行筛选，以对包含例如123或321的数据集进行子集化。

到目前为止，我尝试使用dplyr -

df %>% filter(., odx1==123 | odx2==123 | odx3==123 | odx4==123 | odx5==123 | odx1==321| odx2==321| odx3==321| odx4==321| odx5==321)

虽然上述方法可行，但有更简洁的方法吗？

我的实际数据集包含odx1-odx25，我有一个大约15个字符串的列表，可以过滤大约100K行。

编辑：

实际数据集包含随机数字字符串，但我只是将0用作可见性和简单性的示例。

Index   odx1    odx2    odx3    odx4    odx5
1       123     421     532     414     981
2       243     321     765     132     321
3       144     322     587     123     444
4       655     321     459     091     676
5       456     421     523     431     768

Answer 1

在我的评论中：

如果数据总是采用这种通用格式（只想摆脱由全0组成的观测值，那么解决方案的速度会快一点（根据击键和计算时间）：

df[rowSums(df[, -1]!=0)!=0,]

Answer 2

或者，如果您需要过滤一组明确的值（您说有15个要过滤的字符串），您可以使用它来过滤所有列。

library(dplyr)
conditions.to.match <- c(123, 321)
df %>% filter(Reduce('|', lapply(df, '%in%', conditions.to.match)))

（Idea from here）

Answer 3

基础套餐：

df[apply(df, 1, function(x) {any(x == 123 | x == 321)}),]

dplyr包

library(dplyr)
filter(df, rowSums(mutate_each(df, funs(. %in% c(123, 321)))) >= 1L)

输出：

  Index odx1 odx2 odx3 odx4 odx5
1     1  123    0    0    0    0
2     2    0  321    0    0    0
3     3    0    0    0  123    0
4     4    0  321    0    0    0

R中的过滤/子集应用于多个列

3 个答案: