我有一个正则表达式,可以处理部分数据。 (兼容Perl) 给出日志条目:
pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我可以使用正则表达式:[\>\:]*\s+(.*?)\:?\s\<(.+?)\>
并获取我正在寻找的结果。 (http://regexr.com/3fatg)
Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob
不幸的是,当我构建这个正则表达式时,我忽略了日志的一个重要部分 - 第一部分。 日志实际上如下所示:
Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我的提取不再正常 - 它被第一部分抛弃了。 (http://regexr.com/3fbod) 如何从此日志文件中排除开头信息?
**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>
我想我需要在最后一次出现之后开始搜索:(在pam_vas之前),但我无法弄清楚如何排除它。
答案 0 :(得分:3)
更新:误解了问题,最佳regex似乎是
(?:^.*?pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>
我玩了几个变种,但发现这是最快的,捕获并忽略日期戳
This may suffice (?:^\*\*[^*]*\*\*[ ]pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>
除非你使用带有ignorewhitespace的东西,否则你可以摆脱单个空格周围的方括号。 [ ]
至。
有更短的变种,但是捕获太多或者采取许多步骤的缺点,大约500-800,对于我发现的一切,与此处的104相比。
(?: # Opens non-capturing group (ncg)
^ # ^ start of line, you may actually not want this
\*\* # Literally **
[^*]* # Anything but *, as many times as possible
\*\* # Literally **
[ ] # A single space, only in brackets for visibility
pam_vas: # Literally pam_vas:
) # Closes NCG
? # Iterates NCG 0 or 1 times, thus "optional"
\s+ # Any number of space characters, one or more
( # Opens Capturing Group 1
[^<:]* # Any Character but < or :, as many times as possible
) # Closes CG1
:? # :, 0 or 1 times
[ ] # A single in space, only in brackets for visibility
< # Literally <
( # Opens CG2
[^>]* # Any character but >, as many times as possible
) # Closes CG2
> # Literally >
答案 1 :(得分:0)
你可以通过以下方式实现:
\b # a word boundary
(?P<key>[\w(): ]+) # the key part - word characters, (, ), :, spaces
\h+ # at least one whitespace (can be more)
<(?P<value>[^>]+)> # the value part in <> brackets
见a demo on regex101.com。这样,就不需要忽略任何事情。
答案 2 :(得分:0)