Question

我有一个ASCII日志文件，其中包含一些我想要提取的内容。我从来没有花时间适当地学习Perl，但我认为这是完成这项任务的好工具。

文件的结构如下：

... 
... some garbage 
... 
... garbage START
what i want is 
on different
lines 
END 
... 
... more garbage ...
next one START 
more stuff I want, again
spread 
through 
multiple lines 
END 
...
more garbage

所以，我正在寻找一种方法来提取每个START和END分隔符字符串之间的行。我怎么能这样做？

到目前为止，我只找到了一些关于如何使用START字符串打印行的示例，或者其他与我正在寻找的内容有关的文档项。

Answer 1

你想要触发器操作符（更好地称为范围操作符）..

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
  if (/START/../END/) {
    next if /START/ || /END/;
    print;
  }
}

将对print的调用替换为您实际想要做的事情（例如，将线条推入数组，编辑，格式化，等等）。我next - 超过实际拥有START或END的行，但您可能不想要这种行为。有关此运算符和其他有用的Perl特殊变量的讨论，请参阅this article。

Answer 2

从perlfaq6回答How can I pull out lines between two patterns that are themselves on different lines?

你可以使用Perl有点奇特的运算符（在perlop中记录）：

perl -ne 'print if /START/ .. /END/' file1 file2 ...

如果你想要文字而不是线条，你可以使用

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

但是如果您想要嵌套出现START到END，那么您将遇到本节中有关匹配平衡文本的问题中描述的问题。

这是使用..的另一个例子：

while (<>) {
    $in_header =   1  .. /^$/;
    $in_body   = /^$/ .. eof;
# now choose between them
} continue {
    $. = 0 if eof;  # fix $.
}

Answer 3

How can I grab multiple lines after a matching line in Perl?

那是怎么回事？在那一个中，END字符串是$ ^，您可以将其更改为END字符串。

我也是新手，但那里的解决方案提供了不少方法......让我更具体地了解你想要的与上述链接不同的内容。

Answer 4

while (<>) {
    chomp;      # strip record separator
    if(/END/) { $f=0;}
    if (/START/) {
        s/.*START//g;
        $f=1;
    }
    print $_ ."\n" if $f;
}

尝试下次编写一些代码

Answer 5

在Telemachus的回复之后，事情开始倾泻而出。这就是我正在研究的解决方案。

我正在尝试提取由两个字符串分隔的行（一个，以“CINFILE =”结尾的行;其他，包含单个“＃”的行）在不同的行中，不包括分隔符行。我可以使用Telemachus的解决方案。
第一行有一个我想删除的空格。我也包括它。
我也试图将每个行集提取到单独的文件中。

这适用于我，虽然代码可以归类为丑陋;这是因为我现在几乎是Perl的新手。无论如何这里是：

#!/usr/bin/env perl
use strict;
use warnings;

my $start='CINFILE=$';
my $stop='^#$';
my $filename;
my $output;
my $counter=1;
my $found=0;

while (<>) {
  if (/$start/../$stop/) {
    $filename=sprintf("boletim_%06d.log",$counter);
    open($output,'>>'.$filename) or die $!;
    next if /$start/ || /$stop/;
    if($found == 0) { print $output (split(/ /))[1]; }
    else { print $output $_; }
    $found=1;
  } else { if($found == 1) { close($output); $counter++; $found=0; } }
}

我希望它也有益于其他人。欢呼声。

Answer 6

来自“虚拟新手”并不算太糟糕。你能做的一件事就是把“$ found = 1”放在“if（$ found == 0）”块中，这样你就不会每次在$ start和$ stop之间做这个任务。

在我看来，另一件有点难看的事情是，每次进入$ start / $ stop-block时都会打开相同的文件处理程序。

这表明了解决方法：

#!/usr/bin/perl

use strict;
use warnings;

my $start='CINFILE=$';
my $stop='^#$';
my $filename;
my $output;
my $counter=1;
my $found=0;

while (<>) {

    # Find block of lines to extract                                                           
    if( /$start/../$stop/ ) {

        # Start of block                                                                       
        if( /$start/ ) {
            $filename=sprintf("boletim_%06d.log",$counter);
            open($output,'>>'.$filename) or die $!;
        }
        # End of block                                                                         
        elsif ( /$end/ ) {
            close($output);
            $counter++;
            $found = 0;
        }
        # Middle of block                                                                      
        else{
            if($found == 0) {
                print $output (split(/ /))[1];
                $found=1;
            }
            else {
                print $output $_;
            }
        }

    }
    # Find block of lines to extract                                                           

}

如何在Perl中的两个行分隔符之间提取行？

6 个答案: