如何将文件名附加到文件

时间:2016-01-14 15:45:58

标签: perl shell unix

我有以下格式的几个文件:

chr10   Cufflinks   transcript  92828   95504   1   -   .   gene_id "CUFF.1"; transcript_id "ENST00000447903"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
chr10   Cufflinks   exon    92828   94054   1   -   .   gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10   Cufflinks   exon    94555   94665   1   -   .   gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10   Cufflinks   exon    94744   94852   1   -   .   gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10   Cufflinks   exon    95348   95504   1   -   .   gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "4"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

我想要实现的是将文件名附加到输入文件中的字符CUFF*。我的文件名是 sample_1 ,因此输出应如下所示:

chr10   Cufflinks   transcript  92828   95504   1   -   .   gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
chr10   Cufflinks   exon    92828   94054   1   -   .   gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10   Cufflinks   exon    94555   94665   1   -   .   gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10   Cufflinks   exon    94744   94852   1   -   .   gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10   Cufflinks   exon    95348   95504   1   -   .   gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "4"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

这是我到目前为止所尝试的:

cat sample_1 | sed 's/CUFF*/CUFF*_sample1/g'

任何Unix单线程都会很棒......

1 个答案:

答案 0 :(得分:3)

sed - 特别是正则表达式 - 不会那样工作。阅读perlre,了解如何编写正则表达式。

特别是 - *并不像您习惯的那样工作 - 它是模式量词,而不是外卡。它适用于之前的"符号"。所以在你的表达中,你要取代CUF'然后是零个或多个" F"的实例。所以它将匹配" CUF"," CUFF"和" CUFFFFFFFF"。

但不是" CUFF.1"。

在表达的右侧,它甚至没有这样做。

也许你想要:

perl -pe 's/(CUFF[^"]+)/$1_sample/g' sample_1 

如果要进行编辑,请使用-i

(注意 - 使用perl因为它确实有效。你当然可以做一些与sed完全相似的事情)。