我正在编写一个Python脚本来处理从Procmail返回的电子邮件。正如本question中所建议的那样,我正在使用以下Procmail配置:
:0:
|$HOME/process_mail.py
我的process_mail.py脚本正在通过stdin接收电子邮件,如下所示:
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
我正试图以这种方式解析消息:
>>> import email
>>> msg = email.message_from_string(full_message)
我希望获取“来自”,“收件人”和“主题”等消息字段。但是,消息对象不包含任何这些字段。
我做错了什么?
答案 0 :(得分:10)
你必须确保线条不会被意外破坏(因为它们在上面,虽然很难说这是否是复制粘贴问题) - 带有完整的消息,例如:
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44) by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3 for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
然后
msg = email.message_from_string(msgtxt)
print msg['Subject']
根据需要打印TEST 12
。
答案 1 :(得分:4)
看起来你的附加行前面没有空格的换行符,根据RFC 2822 §2.3.2是非法的:
每个标题字段在逻辑上是包含
的单行字符 字段名称,冒号和字段正文。为方便起见 但是,为了处理每行的998/78字符限制,
标题字段的字段主体部分可以分成多个字段 线表示;这被称为“折叠”。一般规则是
这个标准允许折叠白色空间的地方(不是 只需WSP字符),可以在任何WSP之前插入CRLF。对于
例如,标题字段:Subject: This is a test
可以表示为:
Subject: This is a test
看起来应该是这样的:
From hostname Tue Jun 15 21:43:30 2010
Received: (qmail 8580 invoked from network); 15 Jun 2010 21:43:22 -0400
Received: from mail-fx0-f44.google.com (209.85.161.44)
by ip-73-187-35-131.ip.secureserver.net with SMTP; 15 Jun 2010 21:43:22 -0400
Received: by fxm19 with SMTP id 19so170709fxm.3
for <username@domain.com>; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.84.1 with SMTP id m1mr2774225mul.26.1276652853684; Tue, 15
Jun 2010 18:47:33 -0700 (PDT)
Received: by 10.123.143.4 with HTTP; Tue, 15 Jun 2010 18:47:33 -0700 (PDT)
Date: Tue, 15 Jun 2010 20:47:33 -0500
Message-ID: <AANLkTikFsIjJ3KYW1HJWcAqQlGXNiXE2YMzrj39I0tdB@mail.gmail.com>
Subject: TEST 12
From: Full Name <username@sender.com>
To: username@domain.com
Content-Type: text/plain; charset=ISO-8859-1
ONE
TWO
THREE
答案 2 :(得分:2)
我回答自己。
我在构建消息的代码中发现了一个错误。它会在某些行之间附加换行符,从而阻止解析器正常工作。