如果您觉得这很琐碎,我是Python的新手,很抱歉。有些电子邮件的电子邮件正文包含以下行:
Event demon log entry:
[27/12/2018 08:15:02] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p1_credit_qv_curve_snap MACHINE: p1prog06
使用此代码
#!/usr/bin/python
import email, imaplib, re
user = 'user@example.com'
pwd = 'pass'
conn = imaplib.IMAP4_SSL("outlook.office365.com")
conn.login(user,pwd)
conn.select("Inbox")
resp, items = conn.uid("search",None, 'All')
items = items[0].split()
for emailid in items:
resp, data = conn.uid("fetch",emailid, "(RFC822)")
if resp == 'OK':
email_body = data[0][1].decode('utf-8')
mail = email.message_from_string(email_body)
if mail["Subject"].find("PA1") > 0 or mail["Subject"].find("PA2") > 0:
match=re.findall(r'Event demon log entry.*\n.*\n.*', email_body , re.IGNORECASE)
print match
我得到:
[u'Event demon log entry:\r\n\r\n[27/12/2018 08:15:02] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p=\r', u'Event demon log entry:<br><br=\r\n>[27/12/2018 08:15:02] CAUAJM_I_40245 EVENT: ALARM ALARM: M=\r\nAXRUNALARM JOB: p1_credit_qv_curve_snap MACHINE: p1prog06<br><br>Attac=\r']
如何摆脱这些HTML输出?
我需要以下输出(如果可以在一行中显示):
Event demon log entry:[27/12/2018 08:15:02] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p1_credit_qv_curve_snap MACHINE: p1prog06
答案 0 :(得分:0)
您可以使用2个捕获组:
(\bEvent demon log entry:)(?:\r?\n|\r)+(\[[^]]+\].*)
请参见regex demo | Python demo
这将匹配:
(\bEvent demon log entry:)
在第一组中捕获(?:\r?\n|\r)+
匹配新行1次以上(或使用{2}
代替+
精确匹配2次)(\[[^]]+\].*)
匹配[
,然后匹配否定的字符类,而不是]
,然后匹配结尾的]
。然后匹配0+次除换行符以外的任意字符例如,使用findall:
import re
regex = r"(\bEvent demon log entry:)(?:\r?\n|\r)+(\[[^]]+\].*)"
email_body = ("Event demon log entry:\n\n"
"[27/12/2018 08:15:02] CAUAJM_I_40245 EVENT: ALARM ALARM: MAXRUNALARM JOB: p1_credit_qv_curve_snap MACHINE: p1prog06")
for (g1, g2) in re.findall(regex, email_body , re.IGNORECASE):
print(g1 + g2)