我正在尝试监视一个网络钓鱼收件箱,该收件箱既可以接收普通电子邮件(即带有潜在附件的HTML /文本),也可以接收附有.MSG文件的电子邮件。
目标是让用户向phishing@company.com发送电子邮件,一旦我解析了各种链接(可能是恶意的)以及附件(也可能是恶意的),我将对其进行一些分析。
我遇到的问题是附加的.msg文件的正文。
使用下面的代码,我可以将原始电子邮件中的“至”,“从”,“主题”以及所有链接拉到。它还会拉下.msg文件的所有附件(即在我的测试中,我能够拉下.msg内的PDF)。但是,我无法获得.msg文件的任何收件人,主题或正文。
当我将其原始打印时,我会以非常难看的格式获得其中的一些内容,但是显然,由于包含多个部分,我在获取该信息方面做错了事。
我对Python还是很陌生,所以将不胜感激。
import imaplib
import base64
import os
import email
from bs4 import BeautifulSoup
server = 'mail.server.com'
email_user = 'phishing@company.com'
email_pass = 'XXXXXXXXXXXX'
output_dir = '/tmp/attachments/'
body = ""
def get_body(msg):
if msg.is_multipart():
return get_body(msg.get_payload(0))
else:
return msg.get_payload(None, True)
def get_attachments(msg):
for part in msg.walk():
if part.get_content_maintype()=='multipart':
continue
if part.get('Content-Disposition') is None:
continue
fileName = part.get_filename()
if bool(fileName):
filePath = os.path.join(output_dir, fileName)
with open(filePath,'wb') as f:
f.write(part.get_payload(decode=True))
mail = imaplib.IMAP4_SSL(server)
mail.login(email_user, email_pass)
mail.select('INBOX')
result, data = mail.search(None, 'UNSEEN')
mail_ids = data[0]
id_list = mail_ids.split()
print(id_list)
for emailid in id_list:
result, email_data = mail.fetch(emailid, '(RFC822)')
raw_email = email_data[0][1]
raw_email_string = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_string)
email_from = str(email.header.make_header(email.header.decode_header(email_message['From'])))
email_to = str(email.header.make_header(email.header.decode_header(email_message['To'])))
subject = str(email.header.make_header(email.header.decode_header(email_message['Subject'])))
print('From: ' + email_from)
print('To: ' + email_to)
print('Subject: ' + subject)
get_attachments(raw_email)
for part in email_message.walk():
body = part.get_payload(0)
content = body.get_payload(decode=True)
soup = BeautifulSoup(content, 'html.parser')
for link in soup.find_all('a'):
print('Link: ' + link.get('href'))
break
答案 0 :(得分:0)
我使用以下代码来完成此工作。我基本上必须在.msg步内进行多个for循环,然后才在text / html部分中提取相关信息。
for emailid in id_list:
result, data = mail.fetch(emailid, '(RFC822)')
raw = email.message_from_bytes(data[0][1])
get_attachments(raw)
#print(raw)
header_from = mail.fetch(emailid, "(BODY[HEADER.FIELDS (FROM)])")
header_from_str = str(header_from)
mail_from = re.search('From:\s.+<(\S+)>', header_from_str)
header_subject = mail.fetch(emailid, "(BODY[HEADER.FIELDS (SUBJECT)])")
header_subject_str = str(header_subject)
mail_subject = re.search('Subject:\s(.+)\'\)', header_subject_str)
#mail_body = mail.fetch(emailid, "(BODY[TEXT])")
print(mail_from.group(1))
print(mail_subject.group(1))
for part in raw.walk():
if part.get_content_type() == 'message/rfc822':
part_string = str(part)
original_from = re.search('From:\s.+<(\S+)>\n', part_string)
original_to = re.search('To:\s.+<(\S+)>\n', part_string)
original_subject = re.search('Subject:\s(.+)\n', part_string)
print(original_from.group(1))
print(original_to.group(1))
print(original_subject.group(1))
if part.get_content_type() == 'text/html':
content = part.get_payload(decode=True)
#print(content)
soup = BeautifulSoup(content, 'html.parser')
for link in soup.find_all('a'):
print('Link: ' + link.get('href'))