perl正则表达式模式匹配

时间:2015-10-16 08:44:38

标签: perl

输入为GMF文件:

CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package -  Charged|3126|GB|7500000|234446

在perl代码中,我使用以下内容从行

中提取字符串
if($line=~m/^(CUSTEVSUMMROW_GPRS|CUSTEVSUMMROW).*?\s(.*?)\|(\d+)\|.*\|(.*?)$/)
{
    $tag=$1;
    $lineTxt=$2;
    $usage = $3;
    $amt = $4;
}

输出:

tag :: CUSTEVSUMMROW_GPRS  lineTxt :: GPRS - Nova Subscriber Non-Smartphone Package  usage :: 3126  amt ::
tag :: CUSTEVSUMMROW_GPRS  lineTxt :: GPRS - Nova Subscriber Smartphone Package  usage :: 3126  amt ::
tag :: CUSTEVSUMMROW_GPRS  lineTxt :: GPRS - Nova Subscriber Non-Smartphone Package - Charged usage :: 3126 amt :: 234446

如何检索/打印使用的单位是MB或GB。任何人都可以帮助我。

2 个答案:

答案 0 :(得分:3)

您不能在\d+之后捕获该列。添加括号即可。

.* 贪婪,即尽可能匹配。添加?以使其节俭

if ($line =~ /^(CUSTEVSUMMROW_GPRS|CUSTEVSUMMROW).*?\s(.*?)\|(\d+)\|(.*?)\|/)

您也可以将备选内容重写为

(CUSTEVSUMMROW(?:_GPRS)?)

答案 1 :(得分:1)

鉴于你在那里:

if($line=~m/^(CUSTEVSUMMROW_GPRS|CUSTEVSUMMROW).*?\s(.*?)\|(\d+)\|(.*?)\|(.*?)$/)
{
    $tag=$1;
    $lineTxt=$2;
    $usage = $3;
    $units = $4;
    $amt = $5;
}

但是我建议这不是解决这个问题的最好方法 - 我会考虑使用split并分别处理你的第一个字段。

这样的事情可能是:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my @fields = qw ( tag lineTxt usage units amt );

while (<DATA>) {
    my ( $first_field, @record )  = split '\|';

    #split the first field on _just_ the first space.
    unshift( @record, $first_field =~ m/^(\w+) (.*)$/ );

    #use a hash slice to put that record into a hash of named keys.
    my %data;
    @data{@fields} = @record;
    print Dumper \%data;

    # can of course, make this an array of hashes quite easily. 
}


__DATA__
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Smartphone Package|3126|GB| | 
CUSTEVSUMMROW_GPRS GPRS - Nova Subscriber Non-Smartphone Package -  Charged|3126|GB|7500000|234446

这会将每条记录打印为:

$VAR1 = {
          'units' => 'GB',
          'tag' => 'CUSTEVSUMMROW_GPRS',
          'amt' => '7500000',
          'usage' => '3126',
          'lineTxt' => 'GPRS - Nova Subscriber Non-Smartphone Package -  Charged'
        };