这是我日志的简单通用规范:
...[XXXHandler] comming time...
...[XXXHandler] [
UID
] start time...
...[XXXHandler] [
UID
] spend time...
在实践中,有大量的请求用相应的 UID 刷新,并且三行模式在彼此之间混乱。这是其中的一部分:
~ cat sample.log
[240] [DeleteAllLettersHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [DeleteAllLettersHandler] [13497] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [DeleteAllLettersHandler] [13497] spend time [1] dbs 1 dbu 1 | {}
[240] [StartBiddingAllianceBossAuctionHandler] [1495] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [GetMazeMainInfoHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] [1495] spend time [1] dbs 1 dbu 0 | {}
[240] [GetMazeMainInfoHandler] [8941] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [GetResHarvestInfoHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [GetResHarvestInfoHandler] [1807] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [RCHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016] ## gotcha
[240] [GetMazeMainInfoHandler] [8941] spend time [10] dbs 27 dbu 2 | {}
[240] [GetResHarvestInfoHandler] [1807] spend time [5] dbs 15 dbu 4 | {}
[240] [StartBiddingAllianceBossAuctionHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] [18052] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] [18052] spend time [1] dbs 1 dbu 0 | {}
[240] [GetResourceAmount] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [GetResourceAmount] [29063] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [GetResourceAmount] [29063] spend time [1] dbs 3 dbu 0 | {}
我的要求是过滤日志,删除混乱的三行模式,同时我可以看到哪个处理程序挂起(日志即将开始但没有开始时间)。
这是我的解决方案:
- cat process.sh
sed -r '
$!N
$!N
$!N
s/(([^\n]*\n)*)[^\n]*\[([^\n]*)\] coming time[^\n]*\n(([^\n]*\n)*)[^\n]*\[\3\] \[([^\n]*)\] start time[^\n]*\n(([^\n]*\n)*)[^\n]*\[\3\] \[\6\] spend time[^\n]*(.*)/\1\4\7\9/
t print
P
D
:print
' |
grep -v '^ *$'
这可以过滤一些模式,但不能全部过滤,因为sed可以处理分散在三个或四个中的一个模式(sed round添加可能更多)。
~ ./process.sh < sample.log
[240] [StartBiddingAllianceBossAuctionHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] [1495] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [GetMazeMainInfoHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] [1495] spend time [1] dbs 1 dbu 0 | {}
[240] [GetMazeMainInfoHandler] [8941] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [RCHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016] ## gotcha
[240] [GetMazeMainInfoHandler] [8941] spend time [10] dbs 27 dbu 2 | {}
[240] [StartBiddingAllianceBossAuctionHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] [18052] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [StartBiddingAllianceBossAuctionHandler] [18052] spend time [1] dbs 1 dbu 0 | {}
使用过滤后的日志作为SEED,一次又一次地过滤,我可以得到我想要的结果:
~ ./process.sh < sample.log | ./process.sh
[240] [GetMazeMainInfoHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [GetMazeMainInfoHandler] [8941] start time [Fri Mar 18 05:00:00 GMT-06:00 2016]
[240] [RCHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016] ## gotcha
[240] [GetMazeMainInfoHandler] [8941] spend time [10] dbs 27 dbu 2 | {}
~ ./process.sh < sample.log | ./process.sh | ./process.sh
[240] [RCHandler] coming time [Fri Mar 18 05:00:00 GMT-06:00 2016] ## gotcha
似乎我只需要过滤几次以获得最终需要的结果。所以我问了一个问题:shell pipe process repeat, @tripleee的回答对我很有用。大约五次过滤后,我可以得到每个日志的最终结果。
但耗时太多,一个10K行日志通常需要花费10分钟来过滤。
所以我的问题是,你能找到一个更好的方法来做到这一点吗?或者如何改进我的方式让它跑得更快。
感谢您的时间!
答案 0 :(得分:0)
我不认为bash能胜任你的问题。
我想建议你试试perl。解析日志并将[Handler Name,question,start,finish]四元组保存到哈希表中,然后您可以扫描哈希表以查找挂起的处理程序。这是一个更具扩展性的解决方案,恕我直言。