在Perl中搜索和打印正则表达式

时间:2014-06-13 20:16:26

标签: regex perl

我需要多次搜索具有正则表达式的输入文件。我需要在新行上打印表达式。

"1-BBMD-DC-FB"|4|{47|"Interval"|00:00:00|00:00:00|1}{48|"Interval"|00:00:00|00:00:00|1}{49|"Interval"|00:00:00|00:00:00|1}{52|"Interval"|00:00:00|00:00:00|1}|{1|"Interval"|"All"|0}|{52|"Interval"|"day"} 

所需的输出应为:

1-BBMD-DC-FB"|{47|"Interval"|00:00:00|00:00:00|1}
1-BBMD-DC-FB"|{48|"Interval"|00:00:00|00:00:00|1}
....

如何实现这一目标?这就是我试过的:

while (<IN>) {
    my ($a,$s,$d,$f,$g,$h,$j) = split (/{/, $_);
    #print ("$a \n");
    print ("$a$s \n");
    print ("$a$d \n");
    print ("$a$d \n");
    print ("$a$f \n");
    print ("$a$g \n");
    print ("$a$h \n");
}
close IN;

6 个答案:

答案 0 :(得分:0)

粗解析器(因为它解析了一些部分两次),但是简单的解析器:

my $field_re = qr/ "[^"]*" | [^{|}]* /x;
my $curlies_re = qr/ \{ (?: $field_re (?: \| $field_re )* )? \} /x;

while (<>) {
   my ($id, $curlies) = / ^ ( $field_re ) \| $field_re \| ( $curlies_re* ) \| /x
      or die("Invalid input or bad parser\n");

   my @curlies = $curlies =~ /$curlies_re/g;

   print("$id|$_\n") for @curlies;
}

输出:

"1-BBMD-DC-FB"|{47|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"|{48|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"|{49|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"|{52|"Interval"|00:00:00|00:00:00|1}

答案 1 :(得分:0)

http://www.regexplanet.com/advanced/perl/index.html

让您测试数据上的各种正则表达式,然后将它们转换为代码。

我使用{*}作为正则表达式 根据行星正则表达式

    $var = $input =~ $regex
$var=1
$`="1-BBMD-DC-FB"|4|{47|"Interval"|00:00:00|00:00:00|1
$&=}
$'={48|"Interval"|00:00:00|00:00:00|1}{49|"Interval"|00:00:00|00:00:00|1}{52|"Interval"|00:00:00|00:00:00|1}|{1|"Interval"|"All"|0}|{52|"Interval"|"day"}

这是一个实验的好地方

答案 2 :(得分:0)

通过查看您的输入,有一个可选的| charcacter。

要打印您想要的内容,}\|?替换为}\n

$subject =~ s/}/}\n/g;

输出:

1-BBMD-DC-FB"|4|{47|"Interval"|00:00:00|00:00:00|1}
{48|"Interval"|00:00:00|00:00:00|1}
{49|"Interval"|00:00:00|00:00:00|1}
{52|"Interval"|00:00:00|00:00:00|1}
{1|"Interval"|"All"|0}
{52|"Interval"|"day"}

要拆分:

@result = split(m/}\|?/, $subject, 0);

答案 3 :(得分:0)

my $test = '"1-BBMD-DC-FB"|4|{47|"Interval"|00:00:00|00:00:00|1}{48|"Interval"|00:00:00|00:00:00|1}{49|"Interval"|00:00:00|00:00:00|1}{52|"Interval"|00:00:00|00:00:00|1}|{1|"Interval"|"All"|0}|{52|"Interval"|"day"}';

my ($prefix, @list) = split(/{/, $test);  # split with "{" as delimiter
$prefix =~ s/4\|//g;          # Remove "4|" after the prefix

foreach my $item (@list) {
    $item =~ s/\|$//g;        # Remove "|" that some entires have between "}{"
    print "$prefix{$item\n";  # Put "{" back before each element
}

输出:

"1-BBMD-DC-FB"| {47|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"| {48|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"| {49|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"| {52|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"| {1|"Interval"|"All"|0}
"1-BBMD-DC-FB"| {52|"Interval"|"day"}

答案 4 :(得分:0)

使用good ole split

use strict;
use warnings;

while (<DATA>) {
    chomp;

    # Split on | not followed by }
    my ($name, $num, $groups, $all, $day) = split /\|(?![^\{\}]*\})/;

    # Separate groups in third field.
    for my $group (split /(?=\{)/, $groups) {
        print "$name|$group\n";
    }
}

__DATA__
"1-BBMD-DC-FB"|4|{47|"Interval"|00:00:00|00:00:00|1}{48|"Interval"|00:00:00|00:00:00|1}{49|"Interval"|00:00:00|00:00:00|1}{52|"Interval"|00:00:00|00:00:00|1}|{1|"Interval"|"All"|0}|{52|"Interval"|"day"} 

输出:

"1-BBMD-DC-FB"|{47|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"|{48|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"|{49|"Interval"|00:00:00|00:00:00|1}
"1-BBMD-DC-FB"|{52|"Interval"|00:00:00|00:00:00|1}

答案 5 :(得分:0)

$stringName = qq{"1-BBMD-DC-FB"|4|{47|"Interval"|00:00:00|00:00:00|1}{48|"Interval"|00:00:00|00:00:00|1}{49|"Interval"|00:00:00|00:00:00|1}{52|"Interval"|00:00:00|00:00:00|1}|{1|"Interval"|"All"|0}|{52|"Interval"|"day"} };
$stringName =~ s|\}\{|\}\n\{|g;

这将使用您想要的\ n字符将它们拆分为新行。如果需要,您可以选择在\ n上拆分成数组。