grep在文件中搜索多个字符串

时间:2015-03-09 15:10:17

标签: regex r

我们如何从文件列表中仅使用名称“log2.read.counts.2289_Tail”grep文件。

data =

"log2.read.counts.2289_12_Tumor_NF4_CTTGTAA_L002" 
"log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"   
"log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"        
"log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002"        
 "log2.read.counts.2289_4_Cell_LmycS3_TGACCAA_L002" 

输出=

     "log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"   
     "log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"   
     "log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002" 

4 个答案:

答案 0 :(得分:3)

如果搜索匹配,

grepl()将返回TRUE。用它来过滤输入向量。如果你不熟悉正则表达式,花一些时间学习它们可能是明智之举。在这种情况下。它正在搜索你的字符串,中间有一个或多个数字。

input <- c("log2.read.counts.2289_12_Tumor_NF4_CTTGTAA_L002", 
           "log2.read.counts.2289_1_Tail_cont_ATCACGA_L002",   
           "log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002",        
           "log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002",        
           "log2.read.counts.2289_4_Cell_LmycS3_TGACCAA_L002" )
> input[grepl("log2\\.read\\.counts\\.2289_[0-9]+_Tail", input)]
[1] "log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"
[2] "log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"
[3] "log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002"

答案 1 :(得分:3)

这是grep的一种方式:

fls <- c("log2.read.counts.2289_12_Tumor_NF4_CTTGTAA_L002", 
"log2.read.counts.2289_1_Tail_cont_ATCACGA_L002",   
"log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002",        
"log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002",        
"log2.read.counts.2289_4_Cell_LmycS3_TGACCAA_L002")

grep("^log2\\.read.counts\\.2289_\\d+_Tail", fls, value=TRUE)

## [1] "log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"
## [2] "log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"
## [3] "log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002"

答案 2 :(得分:0)

假设您还没有文件列表: ls -1 | grep“log2.read.counts.2289_ [0-9] {1,} _ Tail”

[0-9] {1。}将匹配任何长度数字序列(在您的示例中,它们都是单个数字 - 1,2,3 - 但这也将匹配23和34)。

这些时期需要被转义,因为它们具有特殊含义(它自己的句号将匹配前面一个字符)。

答案 3 :(得分:-2)

grep'^ log2.read.counts.2289_ 尾巴'