我有一个小的python脚本,它从POP邮件地址中提取电子邮件并将它们转储到一个文件中(一个文件一个电子邮件)
然后PHP脚本遍历文件并显示它们。
我遇到了ISO-8859-1(Latin-1)编码电子邮件的问题
以下是我得到的文字示例:=?iso-8859-1?Q?G = EDsli_Karlsson?=和Sj = E1um hva = F0 = F3li er kl = E1r J
我拉电子邮件的方式就是这段代码。
pop = poplib.POP3(server)
mail_list = pop.list()[1]
for m in mail_list:
mno, size = m.split()
lines = pop.retr(mno)[1]
file = StringIO.StringIO("\r\n".join(lines))
msg = rfc822.Message(file)
body = file.readlines()
f = open(str(random.randint(1,100)) + ".email", "w")
f.write(msg["From"] + "\n")
f.write(msg["Subject"] + "\n")
f.write(msg["Date"] + "\n")
for b in body:
f.write(b)
我在python和php中尝试了所有编码/解码组合。
答案 0 :(得分:3)
您可以使用python电子邮件库(python 2.5+)来避免这些问题:
import email
import poplib
import random
from cStringIO import StringIO
from email.generator import Generator
pop = poplib.POP3(server)
mail_count = len(pop.list()[1])
for message_num in xrange(mail_count):
message = "\r\n".join(pop.retr(message_num)[1])
message = email.message_from_string(message)
out_file = StringIO()
message_gen = Generator(out_file, mangle_from_=False, maxheaderlen=60)
message_gen.flatten(message)
message_text = out_file.getvalue()
filename = "%s.email" % random.randint(1,100)
email_file = open(filename, "w")
email_file.write(message_text)
email_file.close()
此代码将从服务器获取所有消息并将其转换为Python消息对象,然后再将它们展平为字符串以写入文件。通过使用Python标准库中的电子邮件包,应该为您处理MIME编码和解码问题。
免责声明:我没有测试过该代码,但它应该可以正常工作。
答案 1 :(得分:2)
这是标题的MIME编码,RFC 2047。以下是如何在Python中解码它:
import email.Header
import sys
header_and_encoding = email.Header.decode_header(sys.stdin.readline())
for part in header_and_encoding:
if part[1] is None:
print part[0],
else:
upart = (part[0]).decode(part[1])
print upart.encode('latin-1'),
print
中更详细的解释(法语)
答案 2 :(得分:2)
有一种更好的方法可以做到这一点,但这就是我最终的结果。谢谢你的帮助。
import poplib, quopri
import random, md5
import sys, rfc822, StringIO
import email
from email.Generator import Generator
user = "email@example.com"
password = "password"
server = "mail.example.com"
# connects
try:
pop = poplib.POP3(server)
except:
print "Error connecting to server"
sys.exit(-1)
# user auth
try:
print pop.user(user)
print pop.pass_(password)
except:
print "Authentication error"
sys.exit(-2)
# gets the mail list
mail_list = pop.list()[1]
for m in mail_list:
mno, size = m.split()
message = "\r\n".join(pop.retr(mno)[1])
message = email.message_from_string(message)
# uses the email flatten
out_file = StringIO.StringIO()
message_gen = Generator(out_file, mangle_from_=False, maxheaderlen=60)
message_gen.flatten(message)
message_text = out_file.getvalue()
# fixes mime encoding issues (for display within html)
clean_text = quopri.decodestring(message_text)
msg = email.message_from_string(clean_text)
# finds the last body (when in mime multipart, html is the last one)
for part in msg.walk():
if part.get_content_type():
body = part.get_payload(decode=True)
filename = "%s.email" % random.randint(1,100)
email_file = open(filename, "w")
email_file.write(msg["From"] + "\n")
email_file.write(msg["Return-Path"] + "\n")
email_file.write(msg["Subject"] + "\n")
email_file.write(msg["Date"] + "\n")
email_file.write(body)
email_file.close()
pop.quit()
sys.exit()
答案 3 :(得分:1)
直到最近,简单的Latin-N或utf-N在标题中是不允许的,这意味着它们将被RFC-1522中首先描述的方法编码,但之后已被取代。口音是用quoted-printable或Base64编码的,用?Q表示? (或?B?for Base64)。你必须解码它们。哦,空格编码为“_”。请参阅Wikipedia。
答案 4 :(得分:0)
那是MIME内容,这就是电子邮件的实际外观,而不是某个地方的错误。您必须在PHP方面使用MIME解码库(或手动解码)(如果我理解正确的话,那就是充当电子邮件渲染器的那个)。
在Python中,您使用mimetools。在PHP中,我不确定。似乎Zend框架在某处有一个MIME解析器,并且可能有数以万计的片段浮动。