Question

我在排序和提取多行文本时遇到了一些麻烦。这是我的代码：

my $searched = $doc->content;
    if($searched =~ /MODIFIED files in Task $_[1] : (.*?) The/gs){
        print $1,"\n";
        $Modified = $1;

    }
    if($searched =~ m/COMPILED in Task $_[1] : (.*?) The/ms){
        $Compiled = $1;

    }
    if($searched =~ m/DELETED in Task $_[1] : (.*?) Comments/ms){
        $Deleted = $1;

    }

以下是文本文件的示例：

The following are the MODIFIED files in Task 50104 :

**Directory                Filename                Version
---------                --------                -------
Something                Something                .....
......                   ......                   .....
.......                  ........                 .....**

The following are the files to be COMPILED in Task 50104 :

**Directory                Filename
---------                --------
.........               .........**


The following are the files to be DELETED in Task 50104 :

**Directory                Filename
---------                --------**

Comments:
 Blah blah.......

**之间的文字是我要提取的内容。抱歉格式不佳

Answer 1

我不确定您的文字是否包含:之前和/评论之前的空格（事实上，在我看来，:后面跟着换行符，并且The之前换行，而不是空格）;而不是使用：

if($searched =~ /MODIFIED files in Task $_[1] : (.*?) The/gs){

尝试使用：

if($searched =~ /MODIFIED files in Task $_[1] :(.*?)The/gs){

我也认为你不需要/ g或/ m开关...

如果这不起作用，我建议您逐步完善正则表达式，即首先确保/MODIFIED files in Task $_[1] :与:匹配，然后添加其余内容。

Answer 2

Flip-flop operator救援！

触发器操作员有左右两侧。一旦左侧评估为真，触发器将保持为真，直到右侧评估为真。

use strict;
use warnings;

my $searched = $doc->content;

my %info;  #< Store in a hash >

open my $string, '<', \$searched or die $!;

{
    my ( $type, $content );

    while ( <$string> ) {  # Process $searched line-by-line

        if ( /(MODIFIED|COMPILED|DELETED)/ ) {

            $type = $1;
        }

        $content .= $_, next if /^Directory/ .. /^\s*$/ ;

        $content =~ s{\s+$}{}; # Don't need that trailing whitespace

        if ( defined $type && defined $content ) {

            $info{$type} = $content;  # Or push @{ $info{$type} }, $content;
            undef $type;
            undef $content;
        }
    }
}

Answer 3

这是一个快速破解（未经测试）。而不是将整个文件读入字符串，而是以逐行模式使用它：

$ script.pl inputfile.txt

my %data;
my $header;
while (<>) {
    next if /^\s*$/; # skip empty lines
    if (/^The following are /) { # header line
        if (/(MODIFIED|COMPILED|DELETED)/) {
            $header = $1;
        } else { die "Bad header: $_" }
    } else { # data line
        die "Header expected" unless (defined $header);
        $data{$header} .= $_;
    }
}

我如何使用Perl在这里提取多行？

3 个答案: