Question

您好我正在尝试删除基于正则表达式匹配的文件内容。以下是代码：

my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my @content = ();
open (INFILE, $file) || die "error2: $!";
while (<INFILE>) 
    {
    chomp;
    if ($_ =~ /\s*3374_Cioin/) 
     {#capture the query sequence
        @content = $_;
        print @content;
     }
    }

示例数据是：

===================================================================
Query= 3374_Cioin
         (24,267 letters)

Database: /home/aprasanna/BLAST/DMel_renamedfile.fasta 
           14,047 sequences; 7,593,731 total letters

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= 578_Antlo
         (88 letters)
=========================================================

我希望从Query =3374_Coin...移除-3402。即到下一个记录分隔符。我能够将匹配的部分存储在@content中。但是，我无法在原始文件中删除它。我希望我的原始文件只有Query= 578_Antlo！

我是Perl的新手。

Answer 1

最简单的方法是简单地将您想要的所有行写入其他文件。

我会建议像：

my $file = "Cioin_PatchAnalysis.txt";
my $outfile = "Fixed_Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my @content = ();
open (INFILE, $file) || die "error2: $!";
open(my $outfile, '>', $outfile) or die "Could not open file '$outfile' $!";
while (<INFILE>) 
    {
    chomp;
    if ($_ !~ /\s*3374_Cioin/) 
     {#capture the query sequence
        @content = $_;
        print $outfile @content;
     }
    }

您可以用新文件替换原件。另一种选择是保留所有与正则表达式不匹配的行，而不是将它们打印回原始文件中：

my $file = "Cioin_PatchAnalysis.txt";
local $/ = 'Query=';
my @content = ();
open (INFILE, $file) || die "error2: $!";

while (<INFILE>) 
    {
    chomp;
    if ($_ !~ /\s*3374_Cioin/) 
     {#capture the query sequence
        push @content, $_;
     }
    }

open(my $outfile, '>', $file) or die "Could not open file '$outfile' $!";
print $outfile @content;

perl中的文件编辑

1 个答案: