从列表中获取子字符串

时间:2018-12-27 11:06:32

标签: python regex list

我正在尝试在电子邮件正文中搜索特定行。我已经能够提取整个电子邮件正文。现在,我要从中提取特定的行。到目前为止,我的代码:

resp, items = conn.uid("search",None, 'All')
items = items[0].split()
for emailid in items:
    resp, data = conn.uid("fetch",emailid, "(RFC822)")
    if resp == 'OK':
        email_body = data[0][1].decode('utf-8')
        mail = email.message_from_string(email_body)
        if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:

           regex = r"(\bEvent demon log entry:)(?:\r?\n|\r)+(\[[^]]+\].*)"
           a=re.findall(regex, email_body , re.IGNORECASE)

我现在得到这些行:

[(u'Event demon log entry:', u'[27/12/2018 05:29:30]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JO=\r')]
[(u'Event demon log entry:', u'[27/12/2018 04:58:05] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2=\r')]
[(u'Event demon log entry:', u'[27/12/2018 06:00:03]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JO=\r')]
[(u'Event demon log entry:', u'[27/12/2018 07:00:05]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JO=\r')]

但希望获得[(u'Event demon log entry:', u'[27/12/2018 05:29:30]EVENT: ALARM ALARM: JO=\r')]之间的所有内容

所需的输出:

CAUAJM_I_40245 EVENT

电子邮件正文中的原始代码:

Event demon log entry:

[27/12/2018 04:48:17]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: bx_p2_reporting EXITCODE:  1

更新:

原来我需要得到以下信息:

JOB: bx_p2_reporting EXITCODE:  1

来自

Event demon log entry:

[26/12/2018 20:17:14] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2=
_batch_excel_RevalFutBasisSpdCalc_NY3pm MACHINE: ldnmdsbatchxl01 EXITCODE: =
268438455

1 个答案:

答案 0 :(得分:2)

您可以使用

r'Event demon log entry:[\r\n]*\[[^]]+]\s*(.*?)\s*EVENT: ALARM'

请参见regex demo

如果将其与re.findall一起使用,则应该只获得CAUAJM_I_40245

详细信息

  • Event demon log entry:-文字子字符串
  • [\r\n]*-0+个CR或LF符号
  • \[-一个[字符
  • [^]]+-除]之外的1个或多个字符
  • ]-一个]字符
  • \s*-0 +空格字符
  • (.*?)-组1:除换行符外,任何零个或多个字符都应尽可能少
  • \s*-0 +空格字符
  • EVENT: ALARM-文字子字符串。

Python demo

import re
rx = r"Event demon log entry:[\r\n]*\[[^]]+]\s*(.*?)\s*EVENT: ALARM"
s = "Event demon log entry:\n\n[27/12/2018 04:48:17]      CAUAJM_I_40245 EVENT: ALARM            ALARM: JOBFAILURE       JOB: bx_p2_reporting EXITCODE:  1"
print(re.findall(rx, s, re.IGNORECASE))
# => ['CAUAJM_I_40245']