我有一个像这样的大文件:
scaffold_58 Cufflinks exon 753 993 . + . gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T03";tss_id "TSS125032"
scaffold_58 Cufflinks exon 753 1642 . + . gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T02";tss_id "TSS125032"
scaffold_58 Cufflinks exon 753 801 . + . gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T01";tss_id "TSS125032"
scaffold_58 Cufflinks exon 871 993 . + . gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T01";tss_id "TSS125032"
以下代码用于使用正则表达式更改gene_id ..
use warnings;
open $final, ">", "./newassembly.gtf";
open NEWREF3, "<", $ARGV[0];
while ($line = <NEWREF3>) {
if ($line =~ /gene_id "([A-Za-z0-9:\-._]*_[oO])([_.][0-9]*)";/) {
$genename = $1; $ext = $2;
$allname = $genename.$ext;
if (!defined $hash_o_count{$genename}{$allname}) {
$num = keys %{$hash_o_count{$genename}};
$hash_o_count{$genename}{$allname} = $num + 1;
}
$num = keys %{$hash_o_count{$genename}};
$line =~ s/gene_id "([A-Za-z0-9:\-._]*_[oO])([_.])[0-9]*";/gene_id "$1$2$hash_o_count{$genename}{$allname}";/g;
print $final $line;
}
elsif ($line =~ /gene_id "([A-Za-z0-9:\-._]*_[xX])([_.][0-9]*)";/) {
$genename = $1; $ext = $2;
$allname = $genename.$ext;
if (!defined $hash_x_count{$genename}{$allname}) {
$num = keys %{$hash_x_count{$genename}};
$hash_x_count{$genename}{$allname} = $num + 1;
}
$num = keys %{$hash_x_count{$genename}};
$line =~ s/gene_id "([A-Za-z0-9:\-._]*_[xX])([_.])[0-9]*";/gene_id "$1$2$hash_x_count{$genename}{$allname}";/g;
print $final $line;
}
else {
print $final $line;
}
}
close NEWREF3;
但是,此代码的输出在文件末尾给出了截断的行...最后一行应该通过代码的最后一行。执行输出文件的head
...
scaffold_58 Cufflinks exon 1153 1642 . + . gene_id "GRMZM6G781015";transcript_id "GRMZM6G781015_T01";tss_id "TSS125032"
scaffold_6 Cufflinks exon 1 289 . + . gene_id "GRMZM6G441368";transcript_id "GRMZM6G441368_T01";tss_id "TSS125033"
scaffold_6 Cufflinks exon 517 591 . + . gene_id "GRMZM6G441368";transcript_id "GRMZM6G441368_T01";tss_id "TSS125033"
scaffold_6 Cufflinks exon 683 905 computer@computer:/home...
为什么会这样,以及如何避免这种情况?
感谢。
答案 0 :(得分:0)
尝试关闭文件./newassembly.gtf
。关闭文件始终是一个好习惯。
为此,它很简单:
close $file;
之后close NEWREF3;