我想解析一个mail.log文件,以便在同一行中最后出现两个模式,要解析的文件大小在500MB和以下之间。 1 GB
我设法得到它:
$ time awk ' $5~"postfix/error" && $6~"4F0A73A11CF" ' MAIL-POSTFIX-LOG-20160226.log |
tail -1
Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>,
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)
real 0m3.572s
user 0m1.920s
sys 0m1.600s
我想继续使用awk命令,但我非常需要提高解析几天数据的性能。
通过使用tac命令来反转文件,从最后一个开始,我观察到使用grep命令提高了性能:
$ time tac MAIL-POSTFIX-LOG-20160226.log | grep "postfix/error" | grep -m1 "4F0A73A11CF"
Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>,
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)
real 0m0.026s
user 0m0.008s
sys 0m0.016s
$ time cat MAIL-POSTFIX-LOG-20160226.log | grep "postfix/error" | grep "4F0A73A11CF" |
tail -1
Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>,
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)
real 0m2.979s
user 0m0.280s
sys 0m0.680s
但是当尝试组合tac和awk命令时,性能不是预期的那样:
time tac MAIL-POSTFIX-LOG-20160226.log | awk ' $5~"postfix/error" && $6~"4F0A73A11CF" ' |
head -1
Feb 26 21:49:23 smtp1 postfix/error[32347]: 4F0A73A11CF: to=<xxxx@xxxxxxxxxxx.xx>,
relay=none, delay=88661, delays=88661/0.02/0/0.05, dsn=4.4.1, status=deferred (delivery
temporarily suspended: connect to xxxxxxxxxxxxxxxxxxxx[x.x.x.x]:25: Connection timed out)
real 0m19.232s
user 0m2.840s
sys 0m4.836s
任何建议
此致
答案 0 :(得分:0)
我认为问题是头脑发热,在第一场比赛后退出时,表现会有所提升:
time tac MAIL-POSTFIX-LOG-20160226.log | awk ' $5~"error" && $6~"4F0A73A11CF" {print; exit} '
real 0m0.048s
user 0m0.024s
sys 0m0.020s