一个新的解析问题! 我每天都在服务器上生成这种日志:
2016-12-31 23:10:29 (UTC) SV-SRV-ABCDEF: PROBLEM [32141] Bla bla bla some text here [12345](High|Ack: No)
2016-12-31 23:10:30 (UTC) SV-SRV-ZXCVBN: PROBLEM [3232] Some other different text [86578](High|Ack: No)
2016-12-31 23:13:59 (UTC) SERVER444: PROBLEM [6565] Still some different stuff [64221](High|Ack: No)
2016-12-31 23:22:25 (UTC) SF-BIZ-IIUUYY: PROBLEM [876543] Guess what, another blabla [73794](Disaster|Ack: No)
2016-12-31 23:23:12 (UTC) SW-ZBC-FFDSDE1: PROBLEM [8765] Host down [16852](Warning|Ack: No)
2016-12-31 23:28:55 (UTC) SF-ZNC-IGFDOIS01: PROBLEM [764389] Managment interface down [29426](Disaster|Ack: No)
2016-12-31 23:30:25 (UTC) KJOIUYTR0-01: PROBLEM [5437823] bla bli blo blu bli [29426](Disaster|Ack: No)
2016-12-31 23:35:38 (UTC) CD-TCA-ZNCVBT01: PROBLEM [7652268] Another different message that includes [] in it [16316](Average|Ack: No)
正如您所看到的,文本可能完全不同,从一行到另一行,不是相同数量的单词,有时可能包含[],依此类推。我需要将此日志插入到数据库中,具体而言领域(详见下文)。
我知道如何解析第一个参数(日期,时间,服务器),但后来我不知道如何解析消息本身,然后是eventid(最后一个括号中的数字,即第一个中的12345) ())和最后的args。 理想情况下,我需要解析此日志,如下所示:
date, time, server, message (without the PROBLEM and [] that starts the message), eventid, priority, ack
对于日志的第一行,它将是:
2016-12-31, 23:10:29, SV-SRV-ABCDEF, Bla bla bla some text here, 12345, High, No
有任何线索如何做到这一点? 我通常使用bash来处理这种东西,但是如果它更容易使用ruby / python / perl。
答案 0 :(得分:1)
以下awk可以帮助您解决问题。
awk -F" PROBLEM " '{
gsub(/ \(.*\)|:$/,"",$1)
sub(/.[^\]]*/,"",$2);
sub(/] /,"",$2);
sub(/\].*\[/,"",$2);
sub(/\(/," ",$2);
gsub(/\|[^ ]*/,",",$2);
gsub(/ \[|\] /,", ",$2);
sub(/)$/,"",$2);
print $1,$2
}
' OFS=", " Input_file
输出如下。
2016-12-31 23:10:29 SV-SRV-ABCDEF, Bla bla bla some text here, 12345, High, No
2016-12-31 23:10:30 SV-SRV-ZXCVBN, Some other different text, 86578, High, No
2016-12-31 23:13:59 SERVER444, Still some different stuff, 64221, High, No
2016-12-31 23:22:25 SF-BIZ-IIUUYY, Guess what, another blabla, 73794, Disaster, No
2016-12-31 23:23:12 SW-ZBC-FFDSDE1, Host down, 16852, Warning, No
2016-12-31 23:28:55 SF-ZNC-IGFDOIS01, Managment interface down, 29426, Disaster, No
2016-12-31 23:30:25 KJOIUYTR0-01, bla bli blo blu bli, 29426, Disaster, No
2016-12-31 23:35:38 CD-TCA-ZNCVBT01, Another different message that includes, 16316, Average, No
答案 1 :(得分:1)
正如您所建议的那样,Python似乎非常适合这样的日志解析任务。
根据您描述的pattern,这里是完整的脚本,用于对每一行进行健壮的解析:
import re
import sys
pattern = r'^(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) \(UTC\) '\
'(?P<server>[\w-]+): PROBLEM \[\d+\] (?P<message>.*) '\
'\[(?P<eventid>\d+)\]\((?P<priority>\w+)\|Ack: (?P<ack>\w+)\)$'
for line in sys.stdin:
m = re.match(pattern, line.strip())
if m:
print("date: {date}, time: {time}, server: {server}, message: {message!r},"\
" eventid: {eventid}, priority: {priority}, ack: {ack}".format(**m.groupdict()))
在样本日志上运行它会产生:
$ python parse.py <log
date: 2016-12-31, time: 23:10:29, server: SV-SRV-ABCDEF, message: 'Bla bla bla some text here', eventid: 12345, priority: High, ack: No
date: 2016-12-31, time: 23:10:30, server: SV-SRV-ZXCVBN, message: 'Some other different text', eventid: 86578, priority: High, ack: No
date: 2016-12-31, time: 23:13:59, server: SERVER444, message: 'Still some different stuff', eventid: 64221, priority: High, ack: No
date: 2016-12-31, time: 23:22:25, server: SF-BIZ-IIUUYY, message: 'Guess what, another blabla', eventid: 73794, priority: Disaster, ack: No
date: 2016-12-31, time: 23:23:12, server: SW-ZBC-FFDSDE1, message: 'Host down', eventid: 16852, priority: Warning, ack: No
date: 2016-12-31, time: 23:28:55, server: SF-ZNC-IGFDOIS01, message: 'Managment interface down', eventid: 29426, priority: Disaster, ack: No
date: 2016-12-31, time: 23:30:25, server: KJOIUYTR0-01, message: 'bla bli blo blu bli', eventid: 29426, priority: Disaster, ack: No
date: 2016-12-31, time: 23:35:38, server: CD-TCA-ZNCVBT01, message: 'Another different message that includes [] in it', eventid: 16316, priority: Average, ack: No
答案 2 :(得分:1)
使用perl你可以做到:
open(FH,"filename") ;
while($inline=<FH>) {
($date,$time,$utc,$server,$message) = ( $inline =~ /^(\d{4}-\d{2}-\d{2})\s+(\d{2}:\d{2}:\d{2})\s+(\(\w{3}\))\s+([^:]*):\s+(.*)/ ) ;
print "$date $time $utc $server $message\n" ;
}
答案 3 :(得分:1)
对于这类问题,sed是一个非常强大的工具,因为它将逐行处理您的日志,并且几乎可以对每行执行任何操作。最简单的方法是应用一组小的替换,每次替换将使您更接近期望的结果。例如,第1步(s / /,/)将用逗号替换日期和时间之间的空格,第2步(s /(UTC)/,/)将替换时区,依此类推---你将会这样做得到照片。所以最终结果将是这样的(每个替换用分号分隔):
sed 's/ /, /;s/ (UTC)/,/;s/: PROBLEM \[[0-9]*\]/,/;s/ \[/, /;s/\](/, /;s/|Ack:/,/; s/)//' logfile > result
另一种方法是使用正确的正则表达式在一个步骤中执行一次替换,但上述方法更容易正确,您可以随时测试每个步骤。