我正在尝试在电子邮件正文中搜索特定行。我已经能够提取整个电子邮件正文。现在,我要从中提取特定的行。到目前为止,我的代码:
resp, items = conn.uid("search",None, 'All')
items = items[0].split()
for emailid in items:
resp, data = conn.uid("fetch",emailid, "(RFC822)")
if resp == 'OK':
email_body = data[0][1].decode('utf-8')
mail = email.message_from_string(email_body)
if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:
regex = r"(\bEvent demon log entry:)(?:\r?\n|\r)+(\[[^]]+\].*)"
a=re.findall(regex, email_body , re.IGNORECASE)
我现在得到这些行:
[(u'Event demon log entry:', u'[27/12/2018 05:29:30] CAUAJM_I_40245 EVENT: ALARM ALARM: JO=\r')]
[(u'Event demon log entry:', u'[27/12/2018 04:58:05] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2=\r')]
[(u'Event demon log entry:', u'[27/12/2018 06:00:03] CAUAJM_I_40245 EVENT: ALARM ALARM: JO=\r')]
[(u'Event demon log entry:', u'[27/12/2018 07:00:05] CAUAJM_I_40245 EVENT: ALARM ALARM: JO=\r')]
但希望获得[(u'Event demon log entry:', u'[27/12/2018 05:29:30]
和EVENT: ALARM ALARM: JO=\r')]
之间的所有内容
所需的输出:
CAUAJM_I_40245 EVENT
电子邮件正文中的原始代码:
Event demon log entry:
[27/12/2018 04:48:17] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: bx_p2_reporting EXITCODE: 1
更新:
原来我需要得到以下信息:
JOB: bx_p2_reporting EXITCODE: 1
来自
Event demon log entry:
[26/12/2018 20:17:14] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: p2=
_batch_excel_RevalFutBasisSpdCalc_NY3pm MACHINE: ldnmdsbatchxl01 EXITCODE: =
268438455
答案 0 :(得分:2)
您可以使用
r'Event demon log entry:[\r\n]*\[[^]]+]\s*(.*?)\s*EVENT: ALARM'
请参见regex demo
如果将其与re.findall
一起使用,则应该只获得CAUAJM_I_40245
。
详细信息
Event demon log entry:
-文字子字符串[\r\n]*
-0+个CR或LF符号\[
-一个[
字符[^]]+
-除]
之外的1个或多个字符]
-一个]
字符\s*
-0 +空格字符(.*?)
-组1:除换行符外,任何零个或多个字符都应尽可能少\s*
-0 +空格字符EVENT: ALARM
-文字子字符串。import re
rx = r"Event demon log entry:[\r\n]*\[[^]]+]\s*(.*?)\s*EVENT: ALARM"
s = "Event demon log entry:\n\n[27/12/2018 04:48:17] CAUAJM_I_40245 EVENT: ALARM ALARM: JOBFAILURE JOB: bx_p2_reporting EXITCODE: 1"
print(re.findall(rx, s, re.IGNORECASE))
# => ['CAUAJM_I_40245']