我如何使用Perl在这里提取多行?

时间:2011-07-27 16:46:30

标签: regex perl

我在排序和提取多行文本时遇到了一些麻烦。这是我的代码:

my $searched = $doc->content;
    if($searched =~ /MODIFIED files in Task $_[1] : (.*?) The/gs){
        print $1,"\n";
        $Modified = $1;

    }
    if($searched =~ m/COMPILED in Task $_[1] : (.*?) The/ms){
        $Compiled = $1;

    }
    if($searched =~ m/DELETED in Task $_[1] : (.*?) Comments/ms){
        $Deleted = $1;

    }

以下是文本文件的示例:

The following are the MODIFIED files in Task 50104 :

**Directory                Filename                Version
---------                --------                -------
Something                Something                .....
......                   ......                   .....
.......                  ........                 .....**

The following are the files to be COMPILED in Task 50104 :

**Directory                Filename
---------                --------
.........               .........**


The following are the files to be DELETED in Task 50104 :

**Directory                Filename
---------                --------**

Comments:
 Blah blah.......

**之间的文字是我要提取的内容。抱歉格式不佳

3 个答案:

答案 0 :(得分:1)

我不确定您的文字是否包含:之前和/评论之前的空格(事实上,在我看来,:后面跟着换行符,并且The之前换行,而不是空格);而不是使用:

if($searched =~ /MODIFIED files in Task $_[1] : (.*?) The/gs){

尝试使用:

if($searched =~ /MODIFIED files in Task $_[1] :(.*?)The/gs){

我也认为你不需要/ g或/ m开关...

如果这不起作用,我建议您逐步完善正则表达式,即首先确保/MODIFIED files in Task $_[1] ::匹配,然后添加其余内容。

答案 1 :(得分:1)

Flip-flop operator救援!

触发器操作员有左右两侧。一旦左侧评估为真,触发器将保持为真,直到右侧评估为真。

use strict;
use warnings;

my $searched = $doc->content;

my %info;  #< Store in a hash >

open my $string, '<', \$searched or die $!;

{
    my ( $type, $content );

    while ( <$string> ) {  # Process $searched line-by-line

        if ( /(MODIFIED|COMPILED|DELETED)/ ) {

            $type = $1;
        }

        $content .= $_, next if /^Directory/ .. /^\s*$/ ;

        $content =~ s{\s+$}{}; # Don't need that trailing whitespace

        if ( defined $type && defined $content ) {

            $info{$type} = $content;  # Or push @{ $info{$type} }, $content;
            undef $type;
            undef $content;
        }
    }
}

答案 2 :(得分:0)

这是一个快速破解(未经测试)。而不是将整个文件读入字符串,而是以逐行模式使用它:

$ script.pl inputfile.txt

my %data;
my $header;
while (<>) {
    next if /^\s*$/; # skip empty lines
    if (/^The following are /) { # header line
        if (/(MODIFIED|COMPILED|DELETED)/) {
            $header = $1;
        } else { die "Bad header: $_" }
    } else { # data line
        die "Header expected" unless (defined $header);
        $data{$header} .= $_;
    }
}