背景
我想开发一个程序来从非结构化日志数据中提取字段。我正在使用grok
来标识与输入字符串匹配的正则表达式。在完成了
示例-
考虑CISCO PIX日志行-
Mar 29 2004 09:54:18: %PIX-6-302005: Built UDP connection for faddr 198.207.223.240/53337 gaddr 10.0.0.187/53 laddr 192.168.0.2/53
对于上面的日志行,我确定了以下正则表达式-
CISCOTIMESTAMP - \b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|ä)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b +(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])(?: (?>\d\d){1,2})? (?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9])
CISCOTAG - [A-Z0-9]+-(?:[+-]?(?:[0-9]+))-(?:[A-Z0-9_]+)
CISCOACTION - Built|Teardown|Deny|Denied|denied|requested|permitted|denied by ACL|discarded|est-allowed|Dropping|created|deleted
IPV4 - (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
URIPATH - (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+(?:\?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*)?
问题
现在,我想merge
一起使用这些正则表达式,但是我也想在两者之间包括填充符。示例-
Built|Teardown|Deny|Denied|denied|requested|permitted|denied by ACL|discarded|est-allowed|Dropping|created|deleted
此正则表达式与日志行中的Built
单词匹配,并且-
(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
这标识了第一个198.207.223.240
(IP Address)
。
但是,当我像这样在regex101.com中将它们合并在一起时-
(Built|Teardown|Deny|Denied|denied|requested|permitted|denied by ACL|discarded|est-allowed|Dropping|created|deleted) ((?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))
显然,它们不能很好地粘合在一起,因为在它们之间有一些词-UDP connection for faddr
-我称之为“填充物”
我想结合捕获的正则表达式,同时考虑它们之间的任意“填充符”。
有没有办法做到这一点?
我的方法
我尝试使用(.*)
和(.*?)
,但是它们太强大了,即取代了其他模式并匹配了其余全部行。
有人可以帮助我达到预期的结果吗?
理想的结果是-
CISCOTIMESTAMP + [FILLER REGEX] + CISCOTAG + [FILLER REGEX] + CISCOACTION + [FILLER REGEX] + IPv4 + URIPATH +依此类推。
答案 0 :(得分:0)
URIPATH在regex101上似乎不可用。您没有逃脱'/'
一旦完成,就可以了。
URIPATH: ((?:\/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+(?:\?[A-Za-z0-9$.+!*'|(){},~@#%&\/=:;_?\-\[\]<>]*)?)
其余的工作正常,以。*作为填充正则表达式。
CISCOTIMESTAMP + [FILLER REGEX] + CISCOTAG + [FILLER REGEX] + CISCOACTION + [FILLER REGEX] + IPv4 + URIPATH
下面的整个正则表达式
(\b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|ä)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b +(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])(?: (?>\d\d){1,2})? (?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9])).*([A-Z0-9]+-(?:[+-]?(?:[0-9]+))-(?:[A-Z0-9_]+)).*(Built|Teardown|Deny|Denied|denied|requested|permitted|denied by ACL|discarded|est-allowed|Dropping|created|deleted).*((?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))((?:\/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+(?:\?[A-Za-z0-9$.+!*'|(){},~@#%&\/=:;_?\-\[\]<>]*)?)
这是指向demo
的链接