Question

我对正则表达式有疑问。我有一个文件，我需要以这样的方式解析它，以便我可以区分其中的一些特定文本块。这些文本块由两条空行分隔（有些块由3或1个空行分隔，但我需要2行）。所以我有一段代码，这是我认为应该匹配的\s*$^\s*$/正则表达式，但事实并非如此。有什么问题？

$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);

干杯，尤利娅

Answer 1

默认情况下，Perl一次读取一行文件，因此您不会看到多个新行。以下代码选择以双新行终止的文本。

    local $/ = "\n\n" ;

    while (<> ) {

      print "-- found $_" ;
    }

Answer 2

新答案

排除＆gt; 2个空行后出现问题，这里睡个好觉是一个更好的方法，甚至不需要啜饮。

#!/usr/bin/perl

use strict;
use warnings;    

my $file = 'yu';
my @blocks; #each element will be an arrayref, one per block
            #that referenced array will hold lines in that block

open(my $fh, '<', $file);

my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
  chomp($line);
  if ($line =~ /^\s*$/) {
    $empty++;
  } elsif ($empty == 2) { #not blank and exactly 2 previous blanks
    $block_num++; # move on to next block
    $empty = 0;
  } else {
    $empty = 0;
  }

  push @{ $blocks[$block_num] }, $line;
}

#write out each block to a new file
my $file_num = 1;
foreach my $block (@blocks) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out join("\n", @$block);
}

实际上，您可以直接在每个块中写入一个文件，而不是稍后存储和写入：

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'yu';

open(my $fh, '<', $file);

my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
  chomp($line);
  if ($line =~ /^\s*$/) {
    $empty++;
  } elsif ($empty == 2) { #not blank and exactly 2 previous blanks
    close(OUT); #just learned this line isn't necessary, perldoc -f close
    open(OUT, '>', ++$block_num . '.txt');
    $empty = 0;
  } else {
    $empty = 0;
  }

  print OUT "$line\n";
}

close(OUT);

Answer 3

不赞成使用新答案

~~justintime的回答是通过告诉perl你想要调用行“\ n \ n”的结尾，这很聪明并且运行良好。~~一个例外是它必须完全匹配。通过你使用它的正则表达式看起来似乎在“空”行上可能有空格，在这种情况下，这将无效。此外，他的方法甚至会分裂超过2个换行符，这在OP中是不允许的。

为了完整性，要按照你要求的方式进行，你需要将整个文件粘贴到一个变量中（如果文件不是那么大，以至于使用所有内存，在大多数情况下可能都很好）。

然后我可能会说使用split函数将文本块拆分成一个块数组。您的代码看起来像是：

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'yu';
my $text;

open(my $fh, '<', $file);
{
  local $/; enables slurp mode inside this block
  $text = <$fh>;
}
close($fh);

my @blocks = split( 
  /
  (?<!\n)\n #check to make sure there isn't another \n behind this one
  \s*\n #first whitespace only line
  \s*\n #second "
  (?!\n) #check to make sure there isn't another \n after this one
  /x, # x flag allows comments and whitespace in regex
  $text
);

然后，您可以对阵列执行操作。如果我理解你对justintime的答案的评论，你想要将每个块写成不同的文件。这看起来像

my $file_num = 1;
foreach my $block (@blocks) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out $block;
}

请注意，由于你在到达foreach块的末尾时用词法（用my）打开$，$ out变量就会消失（即“超出范围”）。当这种情况发生在词法文件句柄中时，文件会自动关闭。你也可以用justintime的方法做类似的事情：

local $/ = "\n\n" ;

my $file_num = 1;
while (<>) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out $block;
}

Answer 4

use 5.012;

open my $fh,'<','1.txt';

#slurping file
local $/;
my $content = <$fh>;

close $fh;

for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
    say 'found:';
    say $block;
}

如何恰好匹配两个空行

4 个答案: