我希望将数据框子集化为仅包含许多列中具有特定字词(" ab"在此示例中)的行。这是我的榜样:
>df
ID RESULT1 RESULT2 RESULT3 RESULT4 ... RESULT30
1 001 abc abcd abcdef cdef ... efs
2 002 cd efg hij kl ... fzh
3 003 zabc efg jgh ldc ... bcs
4 004 efx cde lfs ab ... cd
5 005 ftx txs sgs lfc ... edf
6 006 lsd mde ald ldf ... klj
7 007 kjl ell oip lab ... jkl
预期输出将是这样的(在任何列中都有" ab"行。
>df.sub
ID RESULT1 RESULT2 RESULT3 RESULT4 ... RESULT30
1 001 abc abcd abcdef cdef ... efs
3 003 zabc efg jgh ldc ... bcs
4 004 efx cde lfs ab ... cd
7 007 kjl ell oip lab ... jkl
有人可以提供一些解决方案吗?我是R.的新人。先谢谢你。
答案 0 :(得分:0)
我们遍历'df'列,使用grepl
来匹配pattern
"ab"
,后者返回list
个逻辑vector
,然后使用list
和Reduce
检查相应的|
元素是否为TRUE,逻辑向量可用于对初始数据集的行进行子集化。
df[Reduce(`|`, lapply(df[-1], grepl, pattern="ab")),]
# ID RESULT1 RESULT2 RESULT3 RESULT4 RESULT30
#1 1 abc abcd abcdef cdef efs
#3 3 zabc efg jgh ldc bcs
#4 4 efx cde lfs ab cd
#7 7 kjl ell oip lab jkl
df <- structure(list(ID = 1:7, RESULT1 = c("abc", "cd", "zabc", "efx",
"ftx", "lsd", "kjl"), RESULT2 = c("abcd", "efg", "efg", "cde",
"txs", "mde", "ell"), RESULT3 = c("abcdef", "hij", "jgh", "lfs",
"sgs", "ald", "oip"), RESULT4 = c("cdef", "kl", "ldc", "ab",
"lfc", "ldf", "lab"), RESULT30 = c("efs", "fzh", "bcs", "cd",
"edf", "klj", "jkl")), .Names = c("ID", "RESULT1", "RESULT2",
"RESULT3", "RESULT4", "RESULT30"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7"))
答案 1 :(得分:0)
以下是基础R的解决方案:
df[rowSums(matrix(grepl("ab", as.matrix(df[-1])), nrow=dim(df[-1])[1])), ]
grepl()
的结果始终是一个向量。因此外部matrix()
。
df <- structure(list(ID = 1:7, RESULT1 = c("abc", "cd", "zabc", "efx",
"ftx", "lsd", "kjl"), RESULT2 = c("abcd", "efg", "efg", "cde",
"txs", "mde", "ell"), RESULT3 = c("abcdef", "hij", "jgh", "lfs",
"sgs", "ald", "oip"), RESULT4 = c("cdef", "kl", "ldc", "ab",
"lfc", "ldf", "lab"), RESULT30 = c("efs", "fzh", "bcs", "cd",
"edf", "klj", "jkl")), .Names = c("ID", "RESULT1", "RESULT2",
"RESULT3", "RESULT4", "RESULT30"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7"))