使用Python解析电子邮件

时间:2010-06-16 02:10:41

标签: python email parsing mime

我正在编写一个Python脚本来处理从Procmail返回的电子邮件。正如本question中所建议的那样,我正在使用以下Procmail配置:

:0:
|$HOME/process_mail.py

我的process_mail.py脚本正在通过stdin接收电子邮件,如下所示:

From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE

我正试图以这种方式解析消息:

>>> import email
>>> msg = email.message_from_string(full_message)

我希望获取“来自”,“收件人”和“主题”等消息字段。但是,消息对象不包含任何这些字段。

我做错了什么?

3 个答案:

答案 0 :(得分:10)

你必须确保线条不会被意外破坏(因为它们在上面,虽然很难说这是否是复制粘贴问题) - 带有完整的消息,例如:

Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44) by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3 for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE

然后

msg = email.message_from_string(msgtxt)
print msg['Subject']

根据需要打印TEST 12

答案 1 :(得分:4)

看起来你的附加行前面没有空格的换行符,根据RFC 2822 §2.3.2是非法的:

  

每个标题字段在逻辑上是包含
的单行字符   字段名称,冒号和字段正文。为方便起见   但是,为了处理每行的998/78字符限制,
  标题字段的字段主体部分可以分成多个字段   线表示;这被称为“折叠”。一般规则是
  这个标准允许折叠白色空间的地方(不是   只需WSP字符),可以在任何WSP之前插入CRLF。对于
  例如,标题字段:

    Subject: This is a test
     

可以表示为:

    Subject: This
     is a test

看起来应该是这样的:

From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
    by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
    for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
    Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1

ONE
TWO
THREE

答案 2 :(得分:2)

我回答自己。

我在构建消息的代码中发现了一个错误。它会在某些行之间附加换行符,从而阻止解析器正常工作。