抓住里面的东西。*?正则表达式

时间:2014-08-27 17:37:28

标签: python regex

我有一个如下日志:

  

事件:“[INIT] WinEvtLog:安全:AUDIT_SUCCESS(528):安全:管理员:AMAZON-D071A6F8:AMAZON-D071A6F8:成功登录:用户名:管理员域:AMAZON-D071A6F8登录ID:(0x0,0x1054A66)登录类型:10登录过程:User32身份验证包:协商工作站名称:AMAZON-D071A6F8登录GUID: - 来电者用户名:AMAZON-D071A6F8 $来电域:WORKGROUP来电者登录ID:(0x0,0x3E7)来电进程ID:968过境服务: - 源网络地址:10.0.0.200源端口:60054 [END]“;

我用这个正则表达式捕获日志:

EVENT:\s\"\[INIT\](?P<log>.*?)\[END\]\";

我这样做是因为我想稍后显示整个EVENT

(?P<log>)里面还有一些我想抓的东西。例如,

Source\sPort:\s(?P<src_port>\d+)
Source\sNetwork\sAddress:\s(?P<src_network_addr>\S+)

以及EVENT中的其他内容。

我不确定如何创建正则表达式以便能够抓取整个EVENT以及EVENT内的位。

2 个答案:

答案 0 :(得分:2)

捕获另一个捕获组内的组,

EVENT:\s\"\[INIT\](?P<log>.*?Source\sNetwork\sAddress:\s(?P<src_network_addr>\S+).*?Source\sPort:\s(?P<src_port>\d+).*?)\[END\]\"

DEMO

上述正则表达式会捕获log以及src_port中出现的src_network_addrlog

答案 1 :(得分:1)

下面列出的正则表达式将匹配任何以EVENT: "[INIT]开头并结束[END]";的事件日志。如果任何感兴趣的短语都在事件日志中,则会记录它们。

请注意嵌套捕获组的使用:(?P<log>...(?P<src_port>...)...)。外部小组将捕捉其整个模式,包括内部小组捕获的任何内容。

另请注意,未参与匹配的任何群组仍会显示在结果dict中,其值为None

import re
from pprint import pprint


texts=[
    'EVENT: "[INIT]WinEvtLog: Security: AUDIT_SUCCESS(528): Security: Administrator: AMAZON-D071A6F8: AMAZON-D071A6F8: Successful Logon: User Name: Administrator Domain: AMAZON-D071A6F8 Logon ID: (0x0,0x1054A66) Logon Type: 10 Logon Process: User32 Authentication Package: Negotiate Workstation Name: AMAZON-D071A6F8 Logon GUID: - Caller User Name: AMAZON-D071A6F8$ Caller Domain: WORKGROUP Caller Logon ID: (0x0,0x3E7) Caller Process ID: 968 Transited Services: - Source Network Address: 10.0.0.200 Source Port: 60054 [END]";',
    'EVENT: "[INIT]Random text with one match Source Port: 60054 And stuff at end [END]";',
    'EVENT: "[INIT]Random text with no matches [END]";']


for text in texts:
  match = re.match(
    r'''
      (?x)                                 # Verbose
      EVENT:\s"\[INIT]                     # anchor from beginning
      (?P<log>                             # record entire entry
        (?:                                # consisting of:
          (?:Source\sNetwork\sAddress:\s   #  src_network_address
            (?P<src_network_address>\S+))
          |                                # OR
          (?:Source\sPort:\s               #  src_port
            (?P<src_port>\S+))
          |                                # OR
          .*?                              #  anything else
        )*                                 # as many times as required
      )
      \s\[END]";$                          # anchor at end
    ''',
    text)
  if(match):
    pprint (match.groupdict())

结果:

{'log': 'WinEvtLog: Security: AUDIT_SUCCESS(528): Security: Administrator: AMAZON-D071A6F8: AMAZON-D071A6F8: Successful Logon: User Name: Administrator Domain: AMAZON-D071A6F8 Logon ID: (0x0,0x1054A66) Logon Type: 10 Logon Process: User32 Authentication Package: Negotiate Workstation Name: AMAZON-D071A6F8 Logon GUID: - Caller User Name: AMAZON-D071A6F8$ Caller Domain: WORKGROUP Caller Logon ID: (0x0,0x3E7) Caller Process ID: 968 Transited Services: - Source Network Address: 10.0.0.200 Source Port: 60054',
 'src_network_address': '10.0.0.200',
 'src_port': '60054'}
{'log': 'Random text with one match Source Port: 60054 And stuff at end',
 'src_network_address': None,
 'src_port': '60054'}
{'log': 'Random text with no matches',
 'src_network_address': None,
 'src_port': None}