假设我有三列数据(sample1
,sample2
和sample3
)。我想要在其中任何一列中出现字母b
或h
的所有行。这很好用:
data <- data.frame(row_name=c("s1_100","s1_200", "s2_300", "s1_400", "s1_500"),
sample1=rep("a",5),
sample2=c(rep("b",2),rep("a",3)),
sample3=c(rep("a",4),"h")
)
data
# row_name sample1 sample2 sample3
# s1_100 a b a
# s1_200 a b a
# s1_300 a a a
# s1_400 a a a
# s1_500 a a h
bh <- c('b','h')
bh_data <- subset(data, ( sample1 %in% bh | sample2 %in% bh | sample3 %in% bh ) )
bh_data
# row_name sample1 sample2 sample3
# s1_100 a b a
# s1_200 a b a
# s1_500 a a h
但是,由于我对每个列提出了同样的问题,是不是有一个不那么冗余的方法呢?
但实际上,我们有超过800列和超过70,000行,我们希望能够选择尽可能多的特定列进行搜索。例如,使用数百个列名称,除非我编写创建R脚本的脚本,否则看起来并不实用。
答案 0 :(得分:3)
尝试
indx <- Reduce(`|`, lapply(df[,-1], `%in%`, bh))
df[indx,]
# row_name sample1 sample2 sample3
#1 s1_100 a b a
#2 s1_200 a b a
#5 s1_500 a a h
或使用data.table
library(data.table)
nm1 <- paste0("sample", 1:3)
setDT(df)[df[, Reduce(`|`,lapply(.SD, `%in%`, bh)), .SDcols=nm1]]
# row_name sample1 sample2 sample3
#1: s1_100 a b a
#2: s1_200 a b a
#3: s1_500 a a h
df <- structure(list(row_name = c("s1_100", "s1_200", "s1_300", "s1_400",
"s1_500"), sample1 = c("a", "a", "a", "a", "a"), sample2 = c("b",
"b", "a", "a", "a"), sample3 = c("a", "a", "a", "a", "h")), .Names = c("row_name",
"sample1", "sample2", "sample3"), class = "data.frame", row.names = c(NA,
-5L))