我有以下格式的几个文件:
chr10 Cufflinks transcript 92828 95504 1 - . gene_id "CUFF.1"; transcript_id "ENST00000447903"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
chr10 Cufflinks exon 92828 94054 1 - . gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10 Cufflinks exon 94555 94665 1 - . gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10 Cufflinks exon 94744 94852 1 - . gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10 Cufflinks exon 95348 95504 1 - . gene_id "CUFF.1"; transcript_id "ENST00000447903"; exon_number "4"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
我想要实现的是将文件名附加到输入文件中的字符CUFF*
。我的文件名是 sample_1 ,因此输出应如下所示:
chr10 Cufflinks transcript 92828 95504 1 - . gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000"; full_read_support "no";
chr10 Cufflinks exon 92828 94054 1 - . gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "1"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10 Cufflinks exon 94555 94665 1 - . gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "2"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10 Cufflinks exon 94744 94852 1 - . gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "3"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
chr10 Cufflinks exon 95348 95504 1 - . gene_id "CUFF.1_sample_1"; transcript_id "ENST00000447903"; exon_number "4"; FPKM "0.0000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
这是我到目前为止所尝试的:
cat sample_1 | sed 's/CUFF*/CUFF*_sample1/g'
任何Unix单线程都会很棒......
答案 0 :(得分:3)
sed
- 特别是正则表达式 - 不会那样工作。阅读perlre
,了解如何编写正则表达式。
特别是 - *
并不像您习惯的那样工作 - 它是模式量词,而不是外卡。它适用于之前的"符号"。所以在你的表达中,你要取代CUF'然后是零个或多个" F"的实例。所以它将匹配" CUF"," CUFF"和" CUFFFFFFFF"。
但不是" CUFF.1"。
在表达的右侧,它甚至没有这样做。
也许你想要:
perl -pe 's/(CUFF[^"]+)/$1_sample/g' sample_1
如果要进行编辑,请使用-i
。
(注意 - 使用perl因为它确实有效。你当然可以做一些与sed
完全相似的事情)。