Perl无法正确打印最后一行

时间:2014-06-09 17:34:58

标签: regex perl file

我有一个像这样的大文件:

scaffold_58 Cufflinks   exon    753 993 .   +   .   gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T03";tss_id "TSS125032"
scaffold_58 Cufflinks   exon    753 1642    .   +   .   gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T02";tss_id "TSS125032"
scaffold_58 Cufflinks   exon    753 801 .   +   .   gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T01";tss_id "TSS125032"
scaffold_58 Cufflinks   exon    871 993 .   +   .   gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T01";tss_id "TSS125032"

以下代码用于使用正则表达式更改gene_id ..

use warnings;
open $final, ">", "./newassembly.gtf";
open NEWREF3, "<", $ARGV[0];
while ($line = <NEWREF3>) {
    if ($line =~ /gene_id "([A-Za-z0-9:\-._]*_[oO])([_.][0-9]*)";/) {
        $genename = $1; $ext = $2;
        $allname = $genename.$ext;
        if (!defined $hash_o_count{$genename}{$allname}) {
            $num = keys %{$hash_o_count{$genename}};
            $hash_o_count{$genename}{$allname} = $num + 1;
        }
        $num = keys %{$hash_o_count{$genename}};
        $line =~ s/gene_id "([A-Za-z0-9:\-._]*_[oO])([_.])[0-9]*";/gene_id "$1$2$hash_o_count{$genename}{$allname}";/g;
        print $final $line;
    }
    elsif ($line =~ /gene_id "([A-Za-z0-9:\-._]*_[xX])([_.][0-9]*)";/) {
        $genename = $1; $ext = $2;
        $allname = $genename.$ext;
        if (!defined $hash_x_count{$genename}{$allname}) {
            $num = keys %{$hash_x_count{$genename}};
            $hash_x_count{$genename}{$allname} = $num + 1;
        }
        $num = keys %{$hash_x_count{$genename}};
        $line =~ s/gene_id "([A-Za-z0-9:\-._]*_[xX])([_.])[0-9]*";/gene_id "$1$2$hash_x_count{$genename}{$allname}";/g;
        print $final $line;
    }
    else {
        print $final $line;
    }
}
close NEWREF3;

但是,此代码的输出在文件末尾给出了截断的行...最后一行应该通过代码的最后一行。执行输出文件的head ...

scaffold_58 Cufflinks   exon    1153    1642    .   +   .   gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T01";tss_id "TSS125032"
scaffold_6  Cufflinks   exon    1   289 .   +   .   gene_id "GRMZM6G441368";transcript_id "GRMZM6G441368_T01";tss_id "TSS125033"
scaffold_6  Cufflinks   exon    517 591 .   +   .   gene_id "GRMZM6G441368";transcript_id "GRMZM6G441368_T01";tss_id "TSS125033"
scaffold_6  Cufflinks   exon    683 905 computer@computer:/home...

为什么会这样,以及如何避免这种情况?

感谢。

1 个答案:

答案 0 :(得分:0)

尝试关闭文件./newassembly.gtf。关闭文件始终是一个好习惯。

为此,它很简单: close $file;之后close NEWREF3;