您好我正在尝试删除基于正则表达式匹配的文件内容。以下是代码:
my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my @content = ();
open (INFILE, $file) || die "error2: $!";
while (<INFILE>)
{
chomp;
if ($_ =~ /\s*3374_Cioin/)
{#capture the query sequence
@content = $_;
print @content;
}
}
示例数据是:
===================================================================
Query= 3374_Cioin
(24,267 letters)
Database: /home/aprasanna/BLAST/DMel_renamedfile.fasta
14,047 sequences; 7,593,731 total letters
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= 578_Antlo
(88 letters)
=========================================================
我希望从Query =3374_Coin...
移除-3402
。即到下一个记录分隔符。我能够将匹配的部分存储在@content
中。但是,我无法在原始文件中删除它。我希望我的原始文件只有Query= 578_Antlo
!
我是Perl的新手。
答案 0 :(得分:1)
最简单的方法是简单地将您想要的所有行写入其他文件。
我会建议像:
my $file = "Cioin_PatchAnalysis.txt";
my $outfile = "Fixed_Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my @content = ();
open (INFILE, $file) || die "error2: $!";
open(my $outfile, '>', $outfile) or die "Could not open file '$outfile' $!";
while (<INFILE>)
{
chomp;
if ($_ !~ /\s*3374_Cioin/)
{#capture the query sequence
@content = $_;
print $outfile @content;
}
}
您可以用新文件替换原件。 另一种选择是保留所有与正则表达式不匹配的行,而不是将它们打印回原始文件中:
my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my @content = ();
open (INFILE, $file) || die "error2: $!";
while (<INFILE>)
{
chomp;
if ($_ !~ /\s*3374_Cioin/)
{#capture the query sequence
push @content, $_;
}
}
open(my $outfile, '>', $file) or die "Could not open file '$outfile' $!";
print $outfile @content;