正则表达式 - 忽略给定点的直线

时间:2017-02-21 19:29:25

标签: regex key-value

我有一个正则表达式,可以处理部分数据。 (兼容Perl) 给出日志条目:

pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我可以使用正则表达式:[\>\:]*\s+(.*?)\:?\s\<(.+?)\>并获取我正在寻找的结果。 (http://regexr.com/3fatg

Authentication = succeeded
for = active directory
user = bobtheperson
account = bobtheperson@com.com
reason = N/A
Access cont(upn) = bob

不幸的是,当我构建这个正则表达式时,我忽略了日志的一个重要部分 - 第一部分。 日志实际上如下所示:

Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info] pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我的提取不再正常 - 它被第一部分抛弃了。 (http://regexr.com/3fbod) 如何从此日志文件中排除开头信息?

**Feb 16 20:04:37 hostname su[1111]: [id 123456 auth.info]** pam_vas: Authentication <succeeded> for <active directory> user: <bobtheperson> account: <bobtheperson@com.com> reason: <N/A> Access cont(upn): <bob>

我想我需要在最后一次出现之后开始搜索:(在pam_vas之前),但我无法弄清楚如何排除它。

3 个答案:

答案 0 :(得分:3)

更新:误解了问题,最佳regex似乎是

(?:^.*?pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>

我玩了几个变种,但发现这是最快的,捕获并忽略日期戳

This may suffice (?:^\*\*[^*]*\*\*[ ]pam_vas:)?\s+([^<:]*):?[ ]<([^>]*)>

除非你使用带有ignorewhitespace的东西,否则你可以摆脱单个空格周围的方括号。 [ ]

有更短的变种,但是捕获太多或者采取许多步骤的缺点,大约500-800,对于我发现的一切,与此处的104相比。

(?:              # Opens non-capturing group (ncg)
  ^              # ^ start of line, you may actually not want this
  \*\*           # Literally ** 
  [^*]*          # Anything but *, as many times as possible 
  \*\*           # Literally **
  [ ]            # A single space, only in brackets for visibility 
  pam_vas:       # Literally pam_vas: 
)                # Closes NCG
?                # Iterates NCG 0 or 1 times, thus "optional" 
\s+              # Any number of space characters, one or more
(                # Opens Capturing Group 1
  [^<:]*         # Any Character but < or :, as many times as possible 
)                # Closes CG1 
:?               # :, 0 or 1 times 
[ ]              # A single in space, only in brackets for visibility
<                # Literally <
(                # Opens CG2 
  [^>]*          # Any character but >, as many times as possible 
)                # Closes CG2
>                # Literally >

答案 1 :(得分:0)

你可以通过以下方式实现:

\b                 # a word boundary
(?P<key>[\w(): ]+) # the key part - word characters, (, ), :, spaces
\h+                # at least one whitespace (can be more)
<(?P<value>[^>]+)> # the value part in <> brackets

a demo on regex101.com。这样,就不需要忽略任何事情。

答案 2 :(得分:0)

在Splunk论坛上与某人交谈后,我有这个正则表达式:

\s+([^\:\<\>]+)(?:\:?\s\<)([^\>]+)\>

http://regexr.com/3fbpb