我正在尝试将Outlook电子邮件中包含的所有数据下载到csv。这是为了建立一个数据集以用作大学项目的一部分。我希望能够使用电子邮件和电子邮件正文中包含的每个标头的数据创建一个csv,并将其转换为csv。
我正在使用IMAPLIB和电子邮件库,并且可以连接到我的电子邮件,连接到所需的文件夹并选择不同的标头(例如“收件人”,“发件人”,“主题”等),并将每个标头添加到单个文件中列表,但是,我要遍历每封电子邮件,并将每个标头和电子邮件正文中包含的数据附加到其中,并将其添加到一个列表中。
from bs4 import BeautifulSoup
import pandas as pd
#selct the ids for each email in the inbox
result, data = mail.uid('search', None, 'ALL')
mail_ids = data[0]
id_list = mail_ids.split()
print(id_list)
date_list = []
from_list = []
to_list = []
message_text = []
subject_list = []
keyword_list = []
# select the first email id and the latest email id
first_email_id = id_list[0]
latest_email_id = id_list[-1]
#iterate each email and fetch the email using RFC822 protocol
for item in id_list:
result2, email_data = mail.uid('fetch', latest_email_id, '(RFC822)' )
raw_email = email_data[0][1].decode("utf-8")
email_message = email.message_from_string(raw_email)
to_ = email_message['To']
from_ = email_message['From']
subject_ = email_message['Subject']
categories_ = email_message['Keywords']
date_ = email_message['Date']
#print(email_message.get_content_type)
#getting the email message content
counter = 1
for part in email_message.walk():
if part.get_content_maintype() == "multipart":
continue
filename = part.get_filename()
if not filename:
ext ='.html'
filename = 'msg-part-%08d%s' %(counter, ext)
counter += 1
#"""
#save file
#"""
content_type = part.get_content_type()
to_list.append(to_)
from_list.append(from_)
#date_list.append(date_)
#date_list = pd.to_datetime(date_list)
print(subject_)
print(content_type)
print(categories_)
if "plain" in content_type:
print(part.get_payload())
message_text.append(msg.get_payload(decode=True))
elif "html" in content_type:
html_ = part.get_payload()
soup = BeautifulSoup(html_, "html.parser")
text = soup.get_text()
print(text)
message_text.append(text)
else:
print(content_type)
我不想列出每个标题并将其附加到单独的列表中。我想遍历每封电子邮件: