Perl - 将regex公式存储在变量中,然后使用它来匹配正则表达式

时间:2013-04-20 18:15:56

标签: regex perl variables match

我有一个文件,其中包含这样的长行:

XEP.101     :1804 000000:I:XEPInfoFormat:Status=ok:TID=00000000516F6161-000874C3-00003E19-62F2B0C6:CallType=gprs:CallStart=20130415210553:CallDuration=4334:ServedParty=724044024363999:ServedLocation=724:OtherParty=TIM:OtherLocation=tim.br:ServedZone=ZO00001:OtherZone=ZP32363:TariffZone=ZN1261:CUST_ID=58922505:CO_ID=58891164:account=8327813:MSISDN=554599836655:theoretical_cost_value=33.323525:BA_Line_Main_value=NA:Tariff=TM_PL5PR:FU_Packs_used=FU_PLWI2:SNCODE_FU=1350_1250_1_BA_FU_PLWI2_Byt_Internet2:MCs_used=NO:bcd=20100319,bcp=P1M:InputFilename=201304172345.000020:EipFilename=/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020:RtxFilename=/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml:BadrateFilename=/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp:FILE=/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET=660

所以我有这个条件,在Perl中匹配这一行:

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(\S*):CallType=(\w*):CallStart=(\d*):CallDuration=(\d*):ServedParty=(\d*):ServedLocation=(\d*):OtherParty=(\w*):OtherLocation=(\w*):ServedZone=(\w*):OtherZone=(\w*):TariffZone=(\w*):CUST_ID=(\d*):CO_ID=(\d*):account=(\d*):MSISDN=(\d*):theoretical_cost_value=(\d*)\.(\d*):BA_Line_Main_value=(\w*):Tariff=(\w*):FU_Packs_used=(\w*):SNCODE_FU=(\w*):MCs_used=(\w*):bcd=(\d*),bcp=(\w*):InputFilename=(\d*)\.(\d*):EipFilename=\/\w*\/\w*\/\w*\/\w*\/\w*\/(\d*)\/(\d*)\.(\d*).*FILE=\/\w*\/\w*\/\w*\/\w*\/\w*\/(\d*)\/(\w*)\+(\w*)-(\d*)-(\d*)-(\w*).(\w*);TICKET=(\d*)/ ) {

所以对我来说没关系,这是匹配并带给我结果。 但是,我想让它更灵活,例如,如果我想匹配整行并在我的匹配中指定一个字段作为我的脚本中的选项,例如(包含在TID =之前),所以,我正在尝试做的是:

use Getopt::Std;
getopts("Ch:t:",\%opts);

if ( $opts{t} ) {
    $TIDS = $opts{t};
} else {
    $TIDS = '/S*';
}

所以,我试图这样做,我的匹配替换变量$ TIDS,使用getopts -t

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(${TIDS})

因此,如果我使用-t选项指定参数,例如:

perl-script.pl -t 888894343

我希望它在我的整个正则表达式中像这样匹配:

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(888894343)

但是,如果我没有指定这个,我希望它匹配如下:

if ( $line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(/S*)

我知道我可以简单地用(/ S *)匹配所有行,然后放一些简单的if条件,如下所示,但这样我失去了性能,因为有很多行像我给出的那样,所以我希望与

进行灵活的匹配
print "$line\n" if $6 eq $TIDS;

有人有什么想法吗?我尝试使用quotemeta,输入简单的引号,双引号我的正则表达但没有用。

3 个答案:

答案 0 :(得分:0)

如果您尝试对变量(例如命令行参数)使用quotemeta,则需要执行以下操作:

$foo = quotemeta($ARGV[0]);

答案 1 :(得分:0)

您的代码不起作用的主要原因是您使用'/S*',其匹配斜杠后跟零个或多个S个字符,而不是'\S*',这是零或更多空白字符。

但是,我认为最好不要使用正则表达式,而是使用split /:/将每个记录拆分为字段。此外,前四个之后的所有字段都是for name=value,因此可以方便地将这些字段放入哈希以便于访问。然后,您只需检查if ($ch{t} eq $params{TID}) { ... }

此代码演示。我使用Data::Dump来显示构建的%params哈希的内容。目前尚不清楚前四个字段中的信息是否重要,但我已将它们提取到@params以备不时之需。

use strict;
use warnings;

use Data::Dump;

my %opts = (t => 888894343);

while (my $line = <DATA>) {
  chomp $line;
  my %params = $line =~ /([^:=]+)=([^:=]+)/g;
  ddx \%params;
  #next if $opts{t} and $params{TID} ne $opts{t};
  my @params = (split /:/, $line, 5)[0..3];
  ddx \@params;
  #print $line;
}

__DATA__
XEP.101     :1804 000000:I:XEPInfoFormat:Status=ok:TID=00000000516F6161-000874C3-00003E19-62F2B0C6:CallType=gprs:CallStart=20130415210553:CallDuration=4334:ServedParty=724044024363999:ServedLocation=724:OtherParty=TIM:OtherLocation=tim.br:ServedZone=ZO00001:OtherZone=ZP32363:TariffZone=ZN1261:CUST_ID=58922505:CO_ID=58891164:account=8327813:MSISDN=554599836655:theoretical_cost_value=33.323525:BA_Line_Main_value=NA:Tariff=TM_PL5PR:FU_Packs_used=FU_PLWI2:SNCODE_FU=1350_1250_1_BA_FU_PLWI2_Byt_Internet2:MCs_used=NO:bcd=20100319,bcp=P1M:InputFilename=201304172345.000020:EipFilename=/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020:RtxFilename=/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml:BadrateFilename=/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp:FILE=/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET=660

<强>输出

# para.pl:11: {
#   account                => 8327813,
#   BA_Line_Main_value     => "NA",
#   BadrateFilename        => "/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp",
#   bcd                    => "20100319,bcp",
#   CallDuration           => 4334,
#   CallStart              => 20130415210553,
#   CallType               => "gprs",
#   CO_ID                  => 58891164,
#   CUST_ID                => 58922505,
#   EipFilename            => "/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020",
#   FILE                   => "/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET",
#   FU_Packs_used          => "FU_PLWI2",
#   InputFilename          => "201304172345.000020",
#   MCs_used               => "NO",
#   MSISDN                 => 554599836655,
#   OtherLocation          => "tim.br",
#   OtherParty             => "TIM",
#   OtherZone              => "ZP32363",
#   RtxFilename            => "/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml",
#   ServedLocation         => 724,
#   ServedParty            => 724044024363999,
#   ServedZone             => "ZO00001",
#   SNCODE_FU              => "1350_1250_1_BA_FU_PLWI2_Byt_Internet2",
#   Status                 => "ok",
#   Tariff                 => "TM_PL5PR",
#   TariffZone             => "ZN1261",
#   theoretical_cost_value => 33.323525,
#   TID                    => "00000000516F6161-000874C3-00003E19-62F2B0C6",
# }
# para.pl:14: ["    XEP.101     ", "1804 000000", "I", "XEPInfoFormat"]

答案 2 :(得分:0)

另一个建议。无需检查TID的值并一次性解析该行:您可以先对记录进行非常快速的检查,然后解析(使用散列技术或使用正则表达式)它是否有意义。

while (<>) {
  next if $opts{t} and $line !~ /:TID=$opts{t}:/;
  # Parse and process record
}