我们如何从文件列表中仅使用名称“log2.read.counts.2289_Tail”grep文件。
data =
"log2.read.counts.2289_12_Tumor_NF4_CTTGTAA_L002"
"log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"
"log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"
"log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002"
"log2.read.counts.2289_4_Cell_LmycS3_TGACCAA_L002"
输出=
"log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"
"log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"
"log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002"
答案 0 :(得分:3)
grepl()将返回TRUE。用它来过滤输入向量。如果你不熟悉正则表达式,花一些时间学习它们可能是明智之举。在这种情况下。它正在搜索你的字符串,中间有一个或多个数字。
input <- c("log2.read.counts.2289_12_Tumor_NF4_CTTGTAA_L002",
"log2.read.counts.2289_1_Tail_cont_ATCACGA_L002",
"log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002",
"log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002",
"log2.read.counts.2289_4_Cell_LmycS3_TGACCAA_L002" )
> input[grepl("log2\\.read\\.counts\\.2289_[0-9]+_Tail", input)]
[1] "log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"
[2] "log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"
[3] "log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002"
答案 1 :(得分:3)
这是grep
的一种方式:
fls <- c("log2.read.counts.2289_12_Tumor_NF4_CTTGTAA_L002",
"log2.read.counts.2289_1_Tail_cont_ATCACGA_L002",
"log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002",
"log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002",
"log2.read.counts.2289_4_Cell_LmycS3_TGACCAA_L002")
grep("^log2\\.read.counts\\.2289_\\d+_Tail", fls, value=TRUE)
## [1] "log2.read.counts.2289_1_Tail_cont_ATCACGA_L002"
## [2] "log2.read.counts.2289_2_Tail_Lmyc_CGATGTA_L002"
## [3] "log2.read.counts.2289_3_Tail_Nfib_TTAGGCA_L002"
答案 2 :(得分:0)
假设您还没有文件列表: ls -1 | grep“log2.read.counts.2289_ [0-9] {1,} _ Tail”
[0-9] {1。}将匹配任何长度数字序列(在您的示例中,它们都是单个数字 - 1,2,3 - 但这也将匹配23和34)。
这些时期需要被转义,因为它们具有特殊含义(它自己的句号将匹配前面一个字符)。
答案 3 :(得分:-2)
grep'^ log2.read.counts.2289_ 尾巴'