我的情况:我目前正从我的收件箱中检索所有的Outlook(2016)邮件,更具体地说我正在检索一个表:
¦ Product ¦ Currency ¦ Tenor (months) ¦ Code 1 ¦
¦ MyItem ¦ USD ¦ 12 ¦ AAA01 ¦
我的目标是抓住每个人的身体然后将它们存储在MsSQL服务器中。
我很难理解' Multipart '一词,现在有了(长)小时的情侣,这一点就更清楚了。
所以现在我的流程是:
Multipart
。
body = part.get_payload(decode=True)
body = b.get_payload(decode=True)
所以在这两种情况下我都使用get_payload(decode=True)
。
当我的邮件是“ Multipart ”时,它会在我的调试器中显示为一个简单的文字:
Product
Currency
Tenor (months)
Code 1
MyItem
USD
12
AAA01
当我的邮件不是 Multipart 时,它会在我的调试器中显示 HTML标记:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head>
<body>
<table>
<tr>
<td><b>Product</b></td>
<td><b>Currency</b></td>
<td><b>Tenor (months)</b></td>
<td><b>Code 1</b></td>
</tr>
<tr>
<td>MyItem</td>
<td>USD</td>
<td>12</td>
<td>AAA01</td>
</tr>
</table>
</body>
</html>
如何使用HTML标签检索Multipart的邮件正文而不是简单文本?
我需要HTML标记来识别每个标头及其对应的值才能使用Beautiful Soup并将所有这些数据插入我的MSSQL Server?
感谢您的帮助,让我更好地了解MIMEMultipart!
这是我的(凌乱)python代码:
@app.route('/ps_rfq_imap', methods=['GET', 'POST'])
def ps_rfq_imap():
#Connection to IMAP/OULTLOOK
url = 'outlook.mycompayny.com'
mailbox = imaplib.IMAP4_SSL(url,993)
user,password = ('mymail@mycompany.com','mypassword')
mailbox.login(user,password)
mailbox.list() # Lists all labels in GMail
mailbox.select('INBOX') # Connected to inbox.
#giving list id, not outlook ones, but uid ones
typ, data = mailbox.search(None,'ALL')
#Get all the uid outlook of all emails
#typ, data = mailbox.uid('search', None,'ALL')
ids = data[0]
id_list = ids.split()
print id_list
#get the most recent email id
latest_email_id = int( id_list[-1] )
for i in range( latest_email_id, latest_email_id-(latest_email_id), -1 ):
print 'EMAIL ID:'
print i
typ, data = mailbox.fetch( i, '(RFC822)')
msg=str(email.message_from_string(data[0][1]))
b = email.message_from_string(msg)
body = ""
if b.is_multipart():
email_from = b['from']
email_subject = b['subject']
print 'FROM:'
print email_from
print 'SUBJECT'
print email_subject
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
# skip any text/plain (txt) attachments
if ctype == 'text/plain' and 'attachment' not in cdispo:
body = part.get_payload(decode=True) # decode
print '******************************* MULTIPART body content***********************************'
print body
break
elif ctype == 'text/html':
print 'HTML PART'
continue
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
email_from = b['from']
email_subject = b['subject']
print 'FROM:'
print email_from
print 'SUBJECT'
print email_subject
body = b.get_payload(decode=True)
print '******************************* SIMMMMMMPPPPLLLLEEEE***********************************'
print body
return body
编辑:这里myupdated代码只捕获我的电子邮件的HTML部分,以防它可以帮助某人:
typ, data = mailbox.fetch( i, '(RFC822)')
msg=str(email.message_from_string(data[0][1]))
b = email.message_from_string(msg)
body = ""
if b.is_multipart():
email_from = b['from']
email_subject = b['subject']
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
# skip any text/plain (txt) attachments
if ctype == 'text/plain' and 'attachment' not in cdispo:
continue
elif ctype == 'text/html':
print 'HTML PART'
body = part.get_payload(decode=True) # decode
soup = BeautifulSoup(body)
metaTag = soup.find_all('meta')
if metaTag is not None:
print 'WE HAVE FOUND THE BODY******************** Time to process it with BS for getting the value of the table'
soup = BeautifulSoup(body, "html.parser")
tables = soup.findChildren('table')
continue
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
continue
最诚挚的问候,