似乎很容易得到
From
To
Subject
等通过
import email
b = email.message_from_string(a)
bbb = b['from']
ccc = b['to']
假设"a"
是原始电子邮件字符串,看起来像这样。
a = """From root@a1.local.tld Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
for <ooo@a1.local.tld>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
Thu, 25 Jul 2013 19:28:59 -0700
From: root@a1.local.tld
Subject: oooooooooooooooo
To: ooo@a1.local.tld
Cc:
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"
This is a multi-part message in MIME format.
--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
ooooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooo
--bound1374805739--"""
问题
如何通过python获取此电子邮件的Body
?
到目前为止,这是我所知道的唯一代码,但我尚未对其进行测试。
if email.is_multipart():
for part in email.get_payload():
print part.get_payload()
else:
print email.get_payload()
这是正确的方法吗?
或者可能有更简单的东西,比如......
import email
b = email.message_from_string(a)
bbb = b['body']
答案 0 :(得分:88)
为了高度肯定你使用实际的电子邮件正文(但仍有可能你没有解析正确的部分),你必须跳过附件,并专注于普通或HTML部分(根据你的需要) )进一步处理。
由于前面提到的附件可以并且经常是text / plain或text / html部分,因此这个非防弹样本会通过检查内容处置标题来跳过这些:
b = email.message_from_string(a)
body = ""
if b.is_multipart():
for part in b.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
# skip any text/plain (txt) attachments
if ctype == 'text/plain' and 'attachment' not in cdispo:
body = part.get_payload(decode=True) # decode
break
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
body = b.get_payload(decode=True)
BTW,walk()
在mime部分上进行了非常好的迭代,get_payload(decode=True)
为你解码base64等工作做了很多工作。
一些背景 - 正如我暗示的那样,MIME电子邮件的精彩世界带来了许多“错误地”找到邮件正文的陷阱。 在最简单的情况下,它位于唯一的“text / plain”部分,而get_payload()非常诱人,但我们并不生活在一个简单的世界中 - 它通常被包含在多部分/替代,相关,混合等内容中。维基百科对其进行了严格的描述 - MIME,但考虑到以下所有这些案例都是有效的 - 而且很常见 - 人们必须考虑周围的安全网:
很常见 - 几乎可以通过普通编辑器(Gmail,Outlook)发送带附件的格式化文本:
multipart/mixed
|
+- multipart/related
| |
| +- multipart/alternative
| | |
| | +- text/plain
| | +- text/html
| |
| +- image/png
|
+-- application/msexcel
相对简单 - 只是替代表示:
multipart/alternative
|
+- text/plain
+- text/html
无论好坏,这种结构也是有效的:
multipart/alternative
|
+- text/plain
+- multipart/related
|
+- text/html
+- image/jpeg
希望这有点帮助。
P.S。我的观点是不要轻易接近电子邮件 - 当你最不期望它时它会咬人:)
答案 1 :(得分:68)
b = email.message_from_string(a)
if b.is_multipart():
for payload in b.get_payload():
# if payload.is_multipart(): ...
print payload.get_payload()
else:
print b.get_payload()
答案 2 :(得分:6)
Python 3.6+提供了内置的便捷方法来查找和解码纯文本正文,如@Todor Minakov
的答案。您可以使用EMailMessage.get_body()
和get_content()
方法:
msg = email.message_from_string(s, policy=email.policy.default)
body = msg.get_body(('plain',))
if body:
body = body.get_content()
print(body)
请注意,如果没有(明显的)纯文本正文部分,则会显示None
。
例如,如果您正在阅读一个mbox文件,您可以给邮箱构造函数一个EmailMessage
工厂:
mbox = mailbox.mbox(mboxfile, factory=lambda f: email.message_from_binary_file(f, policy=email.policy.default), create=False)
for msg in mbox:
...
请注意,您必须通过email.policy.default
作为政策,因为默认情况下不是 ...
答案 3 :(得分:4)
python中没有b['body']
。你必须使用get_payload。
if isinstance(mailEntity.get_payload(), list):
for eachPayload in mailEntity.get_payload():
...do things you want...
...real mail body is in eachPayload.get_payload()...
else:
...means there is only text/plain part....
...use mailEntity.get_payload() to get the body...
祝你好运。
答案 4 :(得分:2)
有很好的package可以用适当的文档解析电子邮件内容。
import mailparser
mail = mailparser.parse_from_file(f)
mail = mailparser.parse_from_file_obj(fp)
mail = mailparser.parse_from_string(raw_mail)
mail = mailparser.parse_from_bytes(byte_mail)
如何使用:
mail.attachments: list of all attachments
mail.body
mail.to
答案 5 :(得分:0)
如果电子邮件是熊猫的数据框,而emails.message列中的电子邮件文本
## Helper functions
def get_text_from_email(msg):
'''To get the content from email objects'''
parts = []
for part in msg.walk():
if part.get_content_type() == 'text/plain':
parts.append( part.get_payload() )
return ''.join(parts)
def split_email_addresses(line):
'''To separate multiple email addresses'''
if line:
addrs = line.split(',')
addrs = frozenset(map(lambda x: x.strip(), addrs))
else:
addrs = None
return addrs
import email
# Parse the emails into a list email objects
messages = list(map(email.message_from_string, emails['message']))
emails.drop('message', axis=1, inplace=True)
# Get fields from parsed email objects
keys = messages[0].keys()
for key in keys:
emails[key] = [doc[key] for doc in messages]
# Parse content from emails
emails['content'] = list(map(get_text_from_email, messages))
# Split multiple email addresses
emails['From'] = emails['From'].map(split_email_addresses)
emails['To'] = emails['To'].map(split_email_addresses)
# Extract the root of 'file' as 'user'
emails['user'] = emails['file'].map(lambda x:x.split('/')[0])
del messages
emails.head()
答案 6 :(得分:-2)
以下是每次都对我有用的代码(对于Outlook电子邮件):
#to read Subjects and Body of email in a folder (or subfolder)
import win32com.client
#import package
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
#create object
#get to the desired folder (MyEmail@xyz.com is my root folder)
root_folder =
outlook.Folders['MyEmail@xyz.com'].Folders['Inbox'].Folders['SubFolderName']
#('Inbox' and 'SubFolderName' are the subfolders)
messages = root_folder.Items
for message in messages:
if message.Unread == True: # gets only 'Unread' emails
subject_content = message.subject
# to store subject lines of mails
body_content = message.body
# to store Body of mails
print(subject_content)
print(body_content)
message.Unread = True # mark the mail as 'Read'
message = messages.GetNext() #iterate over mails