使用编码字的MIME标头中的换行符合法吗?

时间:2018-09-21 11:10:46

标签: email mime rfc

RFC 2047定义了用于对MIME文档中的非ASCII字符进行编码的 encoded-words 机制。它指定在编码字内不允许使用空格字符(空格和制表符)。

但是,用于解析电子邮件MIME文档的RFC 5322指定应将长标题行“折叠”。这种折叠应该在编码字解码之前还是之后进行?

我最近收到一封电子邮件,其中标题的编码文本部分中包含换行符,如下所示:

Header: =?UTF-8?Q?=C3=A5
 =C3=A4?=

这有效吗?

当然,电子邮件在许多令人兴奋的方式上可能都是无效的,解析器需要处理这种情况,但是知道“正确”的方式很有趣。 :)

1 个答案:

答案 0 :(得分:1)

我误解了问题,并回答它好像是另一种空白。在这种情况下,空格出现在MIME字内,而不是由空格分隔的多个空格。

明确禁止这种行为。从介绍到RFC2047的格式:

2. Syntax of encoded-words

   An 'encoded-word' is defined by the following ABNF grammar.  The
   notation of RFC 822 is used, with the exception that white space
   characters MUST NOT appear between components of an 'encoded-word'.

然后在同一部分:

   IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
   by an RFC 822 parser.  As a consequence, unencoded white space
   characters (such as SPACE and HTAB) are FORBIDDEN within an
   'encoded-word'.  For example, the character sequence

      =?iso-8859-1?q?this is some text?=

   would be parsed as four 'atom's, rather than as a single 'atom' (by
   an RFC 822 parser) or 'encoded-word' (by a parser which understands
   'encoded-words').  The correct way to encode the string "this is some
   text" is to encode the SPACE characters as well, e.g.

      =?iso-8859-1?q?this=20is=20some=20text?=

   The characters which may appear in 'encoded-text' are further
   restricted by the rules in section 5.

更早的答案

明确允许这种事情。带有MIME字词的标题应为76个字符或更少,并在需要时折叠。 RFC822折叠后的标头在第二行和任何其他行后缩进。 RFC2047标头应该只缩进一个空格。第一行中的?=和=?之间的空格。应该禁止输出。

请参阅RFC第12页底部的示例:

encoded form                                displayed as
---------------------------------------------------------------------
(=?ISO-8859-1?Q?a?=                         (ab)
   =?ISO-8859-1?Q?b?=)

       Any amount of linear-space-white between 'encoded-word's,
       even if it includes a CRLF followed by one or more SPACEs,
       is ignored for the purposes of display.