模式匹配后重命名行与整个文件中的标题

时间:2014-10-13 06:21:17

标签: python perl

我的文件如下:

BLOCK: offset: 59051 len: 1615 phased: 37 SPAN: 1614 MECscore 65.96 fragments 266
59294   0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    513 C   A   0/1:23,12:35:99:262,0,691   19,10:-40.6,-28.8,-78.7:-11.9:6.0
59876   0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1095    G   A   0/1:35,12:47:99:328,0,1157  30,11:-61.1,-63.4,-134.7:2.2:12.0
59998   0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1217    G   A   0/1:22,12:34:99:314,0,730   20,10:-68.4,-54.2,-109.0:-14.2:6.0
60000   0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1219    A   C   0/1:22,12:34:99:308,0,715   20,10:-69.9,-54.2,-107.7:-15.7:6.0
60502   0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1721    G   C   0/1:15,6:21:99:141,0,464    7,5:-21.8,-18.5,-30.1:-3.3:4.0
BLOCK: offset: 60874 len: 79 phased: 3 SPAN: 78 MECscore 11.99 fragments 21
60952   0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    2171    G   C   0/1:14,13:27:99:388,0,369   9,5:-35.3,-26.5,-46.7:-8.7:3.0
BLOCK: offset: 62339 len: 3617 phased: 123 SPAN: 3616 MECscore 1516.57 fragments 4565
62442   1   0   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    3661    G   A   0/1:148,55:203:99:1070,0,4008   107,39:-163.0,-160.9,-438.4:-2.1:33.0
62481   1   0   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    3700    C   T

我想通读文件并重命名每行的第一个字段,以便将其分组为前面的" BLOCK"线。我想重命名" BLOCK"第一个被称为" BLOCK1" ,第二个" BLOCK2"等。我想要的输出看起来像这样:

BLOCK1: offset: 59051 len: 1615 phased: 37 SPAN: 1614 MECscore 65.96 fragments 266
BLOCK1  0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    513 C   A   0/1:23,12:35:99:262,0,691   19,10:-40.6,-28.8,-78.7:-11.9:6.0
BLOCK1  0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1095    G   A   0/1:35,12:47:99:328,0,1157  30,11:-61.1,-63.4,-134.7:2.2:12.0
BLOCK1  0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1217    G   A   0/1:22,12:34:99:314,0,730   20,10:-68.4,-54.2,-109.0:-14.2:6.0
BLOCK1  0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1219    A   C   0/1:22,12:34:99:308,0,715   20,10:-69.9,-54.2,-107.7:-15.7:6.0
BLOCK1  0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    1721    G   C   0/1:15,6:21:99:141,0,464    7,5:-21.8,-18.5,-30.1:-3.3:4.0
BLOCK2: offset: 60874 len: 79 phased: 3 SPAN: 78 MECscore 11.99 fragments 21
BLOCK2  0   1   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    2171    G   C   0/1:14,13:27:99:388,0,369   9,5:-35.3,-26.5,-46.7:-8.7:3.0
BLOCK3: offset: 62339 len: 3617 phased: 123 SPAN: 3616 MECscore 1516.57 fragments 4565
BLOCK3  1   0   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    3661    G   A   0/1:148,55:203:99:1070,0,4008   107,39:-163.0,-160.9,-438.4:-2.1:33.0
BLOCK3  1   0   Locus_540_Transcript_32_Length_8324_genewise_newlength_8215__CDS__3870__6491    3700    C   T

我是编程的新手,并尝试过使用awk / sed和perl,而我似乎无法弄清楚这一点:(我真的很感激一些帮助,最好还是对每个问题有一些解释代码行。非常感谢!!!

1 个答案:

答案 0 :(得分:0)

使用perl oneliner

perl -pe 's/^BLOCK\K/++$i/e or s/^\d+/"BLOCK$i"/e' file.txt 

切换

  • -p:为输入文件中的每个“行”创建一个while(<>){...; print}循环。
  • -e:告诉perl在命令行上执行代码。

Live Demo