Question

我需要从Perl的多行字符串中提取几个部分。我在while循环中应用相同的正则表达式。我的问题是得到以文件结尾的最后一节。我的解决方法是附加标记。这样正则表达式总会找到并结束。有没有更好的方法呢？

示例文件：

                    if ($user == $email && $pass = $password) {
                        session_start();
                        $_SESSION['mysesi'] = $name;
                        $_SESSION['user'] = $user;
                        echo "<script>window.location.assign('index.php')</script>";
                    } elseif (empty($email) || empty($password)) {
                        ?>
                        <div class="alert alert-danger alert-dismissible" role="alert">
                            <button type="button" class="close" data-dismiss="alert"><span aria-hidden="true">×</span><span class="sr-only">Close</span></button>
                            <strong>Warning!</strong> Please fill out all fields.
                        </div>
                        <?php
                    } else {
                        ?>
                        <div class="alert alert-danger alert-dismissible" role="alert">
                            <button type="button" class="close" data-dismiss="alert"><span aria-hidden="true">×</span><span class="sr-only">Close</span></button>
                            <strong>Warning!</strong> Incorrect combination of Email Address and Password.
                        </div>
        <?php
    }
}

Perl脚本：

Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

这样脚本会找到两个段。如何避免添加标记？

Answer 1

使用贪婪修饰符?是一个巨大的红旗。你通常可以在一个模式中使用它一次，但更多的通常是一个bug。如果要匹配不包含字符串的文本，请改用以下内容：

(?:(?!STRING).)*

这样可以获得以下信息：

/
   ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
   (?<content> (?:(?! ^==== ).)* )
/xsmg

代码：

my $desc = do { local $/; <DATA> };

while (
   $desc =~ /
      ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
      (?<content> (?:(?! ^==== ).)* )
   /xsmg
) {
   print "filename=<<$+{filename}>>\n";
   print "content=<<$+{content}>>\n";
}

__DATA__
Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

输出：

filename=<</home/src/file1.c#1>>
content=<<content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

>>
filename=<</home/src/file2.c#1>>
content=<<content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2
>>

Answer 2

你首先通过诋毁整个文件让你变得更加尴尬。如果您逐行读取文件，这相对简单

use strict;
use warnings 'all';

my $file;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
        print "filename=$file\n";
        print 'content=';
    }
    elsif ( $file ) {
        print;
    }
}

输出

filename=/home/src/file1.c
content=content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

filename=/home/src/file2.c
content=content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

或者，如果您需要存储每个文件的整个内容，可能作为哈希，它看起来像这样

use strict;
use warnings 'all';

my $file;
my %data;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
    }
    elsif ( $file ) {
        $data{$file} .= $_;
    }
}

for my $file ( sort keys %data ) {
    print "filename=$file\n";
    print "content=$data{$file}";
}

输出与上面第一个版本的输出相同

如何使用perl的正则表达式迭代多行字符串

2 个答案:

输出