使用多列中的特定字符串对行进行子集

时间:2016-05-07 04:23:22

标签: r

我希望将数据框子集化为仅包含许多列中具有特定字词(" ab"在此示例中)的行。这是我的榜样:

>df
    ID  RESULT1   RESULT2   RESULT3   RESULT4   ...   RESULT30
1   001   abc        abcd     abcdef     cdef    ...      efs
2   002   cd          efg       hij       kl     ...      fzh
3   003   zabc        efg       jgh       ldc    ...      bcs
4   004   efx         cde       lfs       ab     ...      cd
5   005   ftx         txs       sgs       lfc    ...      edf
6   006   lsd         mde       ald       ldf    ...      klj
7   007   kjl         ell       oip       lab    ...      jkl

预期输出将是这样的(在任何列中都有" ab"行。

>df.sub
   ID   RESULT1   RESULT2   RESULT3   RESULT4   ...   RESULT30
1  001   abc        abcd     abcdef     cdef    ...      efs
3  003   zabc        efg       jgh       ldc    ...      bcs
4  004   efx         cde       lfs       ab     ...      cd
7  007   kjl         ell       oip       lab    ...      jkl

有人可以提供一些解决方案吗?我是R.的新人。先谢谢你。

2 个答案:

答案 0 :(得分:0)

我们遍历'df'列,使用grepl来匹配pattern "ab",后者返回list个逻辑vector,然后使用listReduce检查相应的|元素是否为TRUE,逻辑向量可用于对初始数据集的行进行子集化。

df[Reduce(`|`, lapply(df[-1], grepl, pattern="ab")),]
#  ID RESULT1 RESULT2 RESULT3 RESULT4 RESULT30
#1  1     abc    abcd  abcdef    cdef      efs
#3  3    zabc     efg     jgh     ldc      bcs
#4  4     efx     cde     lfs      ab       cd
#7  7     kjl     ell     oip     lab      jkl

数据

df <- structure(list(ID = 1:7, RESULT1 = c("abc", "cd", "zabc", "efx", 
"ftx", "lsd", "kjl"), RESULT2 = c("abcd", "efg", "efg", "cde", 
"txs", "mde", "ell"), RESULT3 = c("abcdef", "hij", "jgh", "lfs", 
"sgs", "ald", "oip"), RESULT4 = c("cdef", "kl", "ldc", "ab", 
"lfc", "ldf", "lab"), RESULT30 = c("efs", "fzh", "bcs", "cd", 
"edf", "klj", "jkl")), .Names = c("ID", "RESULT1", "RESULT2", 
"RESULT3", "RESULT4", "RESULT30"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6", "7"))

答案 1 :(得分:0)

以下是基础R的解决方案:

df[rowSums(matrix(grepl("ab", as.matrix(df[-1])), nrow=dim(df[-1])[1])), ]

grepl()的结果始终是一个向量。因此外部matrix()

数据

df <- structure(list(ID = 1:7, RESULT1 = c("abc", "cd", "zabc", "efx", 
"ftx", "lsd", "kjl"), RESULT2 = c("abcd", "efg", "efg", "cde", 
"txs", "mde", "ell"), RESULT3 = c("abcdef", "hij", "jgh", "lfs", 
"sgs", "ald", "oip"), RESULT4 = c("cdef", "kl", "ldc", "ab", 
"lfc", "ldf", "lab"), RESULT30 = c("efs", "fzh", "bcs", "cd", 
"edf", "klj", "jkl")), .Names = c("ID", "RESULT1", "RESULT2", 
"RESULT3", "RESULT4", "RESULT30"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6", "7"))