试图在Python中提取“Reply-To”标题字段,而不是获取电子邮件地址

时间:2016-12-06 14:40:47

标签: python gmail imaplib

我尝试通过搜索Google来调整我找到的这个脚本。 与以前收到的电子邮件完美配合,因为它直接提取“发件人”字段,我没有收到错误。

以下是我的代码:

#!/usr/bin/python

import imaplib
import sys
import email
import re

#FOLDER=sys.argv[1]
FOLDER='folder'
LOGIN='login@gmail.com'
PASSWORD='password'
IMAP_HOST = 'imap.gmail.com'  # Change this according to your provider

email_list = []
email_unique = []

mail = imaplib.IMAP4_SSL(IMAP_HOST)
mail.login(LOGIN, PASSWORD)
mail.select(FOLDER)

result, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
for i in id_list:
    typ, data = mail.fetch(i,'(RFC822)')
    for response_part in data:
        if isinstance(response_part, tuple):
            msg = email.message_from_string(response_part[1])
            sender = msg['reply-to'].split()[0]
            address = re.sub(r'[<>]','',sender)
# Ignore any occurences of own email address and add to list
    if not re.search(r'' + re.escape(LOGIN),address) and not address in email_list:
        email_list.append(address)
        print address

1 个答案:

答案 0 :(得分:3)

正确的方法是使用标准库中的email.utils包中的parseaddr,而不是乱搞字符串拆分和切片。它正确处理电子邮件标题中的各种合法地址格式。

一些例子:

>>> from email.utils import parseaddr
>>> parseaddr("sally@foo.com")
('', 'sally@foo.com')
>>> parseaddr("<sally@foo.com>")
('', 'sally@foo.com')
>>> parseaddr("Sally <sally@foo.com>")
('Sally', 'sally@foo.com')
>>> parseaddr("Sally Smith <sally@foo.com>")
('Sally Smith', 'sally@foo.com')
>>> 

此外,您不应该假设电子邮件具有Reply-To标头。许多人没有。