我想解析一个包含如下数据的文件:
05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam - - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam - - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam - - 2uid=kjsdsdjhjsh@abc.com
并获得:
05/26/2013 06:09:49 and uid=radash@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:48 and uid=rad-ash2s@abc.com,ou=People,o=zeb.com
我尝试拆分(' - ')但它不能拆分(' - '),因为你可以看到: 像上面第二行的一些行有:rad-ash2s@abc.com(a' - ')介于两者之间。 有时候,数据的其他部分也有“ - ”。
请帮忙。
答案 0 :(得分:1)
你最好使用正则表达式。使用正则表达式,我可以使用(...)
快速获取我想要的字符串部分。请参阅Regular expressions上的Perldoc,了解各种正则表达式元字符的含义。
#! /usr/bin/env perl
use 5.12.0;
use warnings;
use autodie;
while ( my $line = <DATA> ) {
chomp $line;
$line =~ s/\\//g; #Remove all backslashes
$line =~ /^(.+?) -.+?(uid=\S+)/;
my $date = $1;
my $uid = $2;
say qq($date and $uid);
}
__DATA__
05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam - - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam - - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam - - 2uid=kjsdsdjhjsh@abc.com
答案 1 :(得分:0)
这个程序可以满足您的要求。看起来字段分隔符是' - '
,即一个空格两边的连字符,给出倒数第二个字段(第十一个)。
此程序需要输入文件的名称作为命令行上的参数。
use strict;
use warnings;
while (<>) {
chomp;
tr/\\//d;
my @fields = split /\x20-\x20/;
printf "%s and %s\n", @fields[0,6];
}
使用您自己的数据,这会产生
05/26/2013 06:09:47 -0700 and uid=radash@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:48 -0700 and uid=radash2s@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:49 -0700 and uid=sjhsjdh@abc.com,ou=People,o=zeb.com