使用awk解析大文件时的性能问题

时间:2016-06-18 12:30:01

标签: performance awk

我想解析一个mail.log文件,以便在同一行中最后出现两个模式,要解析的文件大小在500MB和以下之间。 1 GB

我设法得到它:

$ time awk ' $5~"postfix/error" && $6~"4F0A73A11CF"  ' MAIL-POSTFIX-LOG-20160226.log | 
tail -1

Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>, 
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery 
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)

real    0m3.572s
user    0m1.920s
sys     0m1.600s

我想继续使用awk命令,但我非常需要提高解析几天数据的性能。

通过使用tac命令来反转文件,从最后一个开始,我观察到使用grep命令提高了性能:

$ time tac MAIL-POSTFIX-LOG-20160226.log | grep "postfix/error" | grep -m1 "4F0A73A11CF"

Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>, 
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery 
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)

real    0m0.026s
user    0m0.008s
sys     0m0.016s

$ time cat MAIL-POSTFIX-LOG-20160226.log | grep "postfix/error" | grep "4F0A73A11CF"  | 
tail -1

Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>, 
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery 
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)

real    0m2.979s
user    0m0.280s
sys     0m0.680s

但是当尝试组合tac和awk命令时,性能不是预期的那样:

time tac MAIL-POSTFIX-LOG-20160226.log | awk ' $5~"postfix/error" && $6~"4F0A73A11CF" ' | 
head -1

Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>, 
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery 
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)

real    0m19.232s
user    0m2.840s
sys     0m4.836s

任何建议

此致

1 个答案:

答案 0 :(得分:0)

我认为问题是头脑发热,在第一场比赛后退出时,表现会有所提升:

time tac MAIL-POSTFIX-LOG-20160226.log |  awk ' $5~"error" && $6~"4F0A73A11CF" {print; exit} '

real    0m0.048s
user    0m0.024s
sys     0m0.020s