如何grep捕获Perl中文件的多行模式

时间:2015-11-04 19:15:50

标签: regex string perl

我有一个看起来像这样的文件:

Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"

我想grep该行\attribute3=的文件,然后我想将引号内找到的字符串保存到一个单独的变量。

这是我到目前为止所拥有的:

#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
while (my $line = <$fh>) {
    if ($line =~ /\/attribute3=/g){
        print $line . "\n";
    }
}

打印出/attribute3="all*the*things*I'm*interested*in*are*inside*here**

我想要all*the*things*I'm*interested*in*are*inside*here**and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sentence.*I*think*we*have*enough*words

所以我接下来要做的是:

#!/bin/perl
use warnings; use strict;
my $file = "data.txt";
open(my $fh, '<', $file) or die $!;
my $part_I_want;
while (my $line = <$fh>) {
    if ($line =~ /\/attribute3=/g){
        $line =~ /^/\attribute3=\"(.*?)/;   # capture everything after the quotation mark
        $part_I_want .= $1;   # the capture group; save the stuff on line 1
        # keep adding to the string until we reach the closing quotation marks
        next (unless $line =~ /\"/){
             $part_I_want .= $_;    
        }
    }
}

上面的代码不起作用。如何grep捕获两个字符之间的多线模式(在这种情况下是它的引号)?

3 个答案:

答案 0 :(得分:2)

my $str = do { local($/); <DATA> };
$str =~ /attribute3="([^"]*)"/;
$str = $1;
$str =~ s/\n/ /g;

__DATA__
Random words go here
/attribute1
/attribute2
/attribute3="all*the*things*I'm*interested*in*are*inside*here**
and*it*goes*into*the*next*line.*blah*blah*blah*foo*foo*foo*foo*
bar*bar*bar*bar*random*words*go*here*until*the*end*of*the*sente
nce.*I*think*we*have*enough*words"

答案 1 :(得分:1)

将整个文件读入单个变量并使用/attribute3=\"([^\"]*)\"/ms

答案 2 :(得分:1)

从命令行:

perl -n0e '/\/attribute3="(.*)"/s && print $1' foo.txt 

这基本上就是你所拥有的,但0标志相当于代码中的undef $/。从手册页:

  

-0 [八进制/十六进制]

     

将输入记录分隔符($ /)指定为八进制或十六进制数。如果没有数字,则空字符是分隔符。