使用perl解析文本文件

时间:2013-06-05 16:55:22

标签: perl

我想解析一个包含如下数据的文件:

05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam -  - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam -  - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam -  - 2uid=kjsdsdjhjsh@abc.com

并获得:

05/26/2013 06:09:49  and uid=radash@abc.com,ou=People,o=zeb.com 
05/26/2013 06:09:48  and uid=rad-ash2s@abc.com,ou=People,o=zeb.com

我尝试拆分(' - ')但它不能拆分(' - '),因为你可以看到: 像上面第二行的一些行有:rad-ash2s@abc.com(a' - ')介于两者之间。 有时候,数据的其他部分也有“ - ”。

请帮忙。

2 个答案:

答案 0 :(得分:1)

你最好使用正则表达式。使用正则表达式,我可以使用(...)快速获取我想要的字符串部分。请参阅Regular expressions上的Perldoc,了解各种正则表达式元字符的含义。

#! /usr/bin/env perl

use 5.12.0;
use warnings;
use autodie;

while ( my $line = <DATA> ) {
    chomp $line;
    $line =~ s/\\//g;   #Remove all backslashes
    $line =~ /^(.+?) -.+?(uid=\S+)/;
    my $date = $1;
    my $uid = $2;
    say qq($date and $uid);
}

__DATA__
05\/26\/2013 06:09:47 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=radash@abc.com\,ou=People\,o=zeb.com - 06:09:47 - http - uizweb_zam -  - 2uid=bolched@abc.com
05\/26\/2013 06:09:48 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.44 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=rad-ash2s@abc.com\,ou=People\,o=zeb.com - 06:09:48 - http - uizweb_zam -  - 2uid=bolchedssd@abc.com
05\/26\/2013 06:09:49 \-0700 - AUTHN_SUCCESS - GET - ddsbcggio_ac  - 200.12.33.43 - abcweb.eegeserv.com\/abcweb\/abcwebInitialize.do?PORT=SPQ  - uid=sjhsjdh@abc.com\,ou=People\,o=zeb.com - 06:09:49 - http - uizweb_zam -  - 2uid=kjsdsdjhjsh@abc.com

答案 1 :(得分:0)

这个程序可以满足您的要求。看起来字段分隔符是' - ',即一个空格两边的连字符,给出倒数第二个字段(第十一个)。

此程序需要输入文件的名称作为命令行上的参数。

use strict;
use warnings;

while (<>) {
  chomp;
  tr/\\//d;
  my @fields = split /\x20-\x20/;
  printf "%s and %s\n", @fields[0,6];
}

使用您自己的数据,这会产生

05/26/2013 06:09:47 -0700 and uid=radash@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:48 -0700 and uid=radash2s@abc.com,ou=People,o=zeb.com
05/26/2013 06:09:49 -0700 and uid=sjhsjdh@abc.com,ou=People,o=zeb.com