Question

@matches = ( $filestr =~ /^[0-9]+\. (.+\n)*/mg );

我有一个文件被读入filestr，但由于某种原因，上面的正则表达式应该与一行的开头匹配，后跟一个数字，一个点，一个空格，然后是任意数量的线后跟一个换行符（因此当有一条只有换行符的行时结束），似乎只是从文件中产生了一些单行。

当我做

之类的事情时

@matches = ( $filestr =~ /^[0-9]+\. .+\n/mg );

我正确匹配一行。

当我这样做时

@matches = ( $filestr =~ /^[0-9]+\. .+\n.+\n/mg );

我匹配相同的单行，然后是一些看似无关的行。我的正则表达式有什么问题？

注意：正则表达式在这个正则表达式测试器中工作正常：https://regex101.com/，它只是在perl中工作。

示例，在本文中：

1. This should
match

2. This should too

3. This
one
also

正则表达式应匹配

1. This should
match

和

2. This should too

和

3. This
one
also

Answer 1

你的正则表达是对的。但是，您正在部分捕获结果。我建议你将整个匹配捕获到一个结果集中，这就是它将如何存储到@matches中。

因此，正确的正则表达式将成为/(^[0-9]+\. (?:.+\n)*)/gm。通过这种方式，您将匹配的结果捕获到$1。将其包装成程序会产生。

虽然它可以在不保留这些括号(...)的情况下工作，因为默认情况下它需要$&（即完全匹配），除非您捕获任何内容。因此，请记住在这些情况下，您应该使用non-capturing group(?: ... )而不是捕获组()。

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $str = '
1. This should
match

2. This should too

3. This
one
also
';

my @matches = $str =~ /^([0-9]+\. (?:.+\n)*)/gm;

print Dumper(\@matches);

<强>输出：

[
          '1. This should
match
',
          '2. This should too
',
          '3. This
one
also
'
        ];

Answer 2

在这种情况下，您应该逐段阅读，而不是逐行阅读文件。为此，您需要将$/设置为空字符串。例如：

use strict;
use warnings;

my @result;

{
    local $/ = "";
    while (<DATA>) {
        chomp;
        push @result, $_ ;
        # or to filter paragraphs that don't start with a digit, use instead:
        # push @result, $_ if /^[0-9]+\./; 
    }
}


__DATA__
1. This should
match

2. This should too

3. This
one
also

为什么这个perl正则表达式不起作用

2 个答案: