RFC 2047定义了用于对MIME文档中的非ASCII字符进行编码的 encoded-words 机制。它指定在编码字内不允许使用空格字符(空格和制表符)。
但是,用于解析电子邮件MIME文档的RFC 5322指定应将长标题行“折叠”。这种折叠应该在编码字解码之前还是之后进行?
我最近收到一封电子邮件,其中标题的编码文本部分中包含换行符,如下所示:
Header: =?UTF-8?Q?=C3=A5
=C3=A4?=
这有效吗?
当然,电子邮件在许多令人兴奋的方式上可能都是无效的,解析器需要处理这种情况,但是知道“正确”的方式很有趣。 :)
答案 0 :(得分:1)
我误解了问题,并回答它好像是另一种空白。在这种情况下,空格出现在MIME字内,而不是由空格分隔的多个空格。
明确禁止这种行为。从介绍到RFC2047的格式:
2. Syntax of encoded-words
An 'encoded-word' is defined by the following ABNF grammar. The
notation of RFC 822 is used, with the exception that white space
characters MUST NOT appear between components of an 'encoded-word'.
然后在同一部分:
IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
by an RFC 822 parser. As a consequence, unencoded white space
characters (such as SPACE and HTAB) are FORBIDDEN within an
'encoded-word'. For example, the character sequence
=?iso-8859-1?q?this is some text?=
would be parsed as four 'atom's, rather than as a single 'atom' (by
an RFC 822 parser) or 'encoded-word' (by a parser which understands
'encoded-words'). The correct way to encode the string "this is some
text" is to encode the SPACE characters as well, e.g.
=?iso-8859-1?q?this=20is=20some=20text?=
The characters which may appear in 'encoded-text' are further
restricted by the rules in section 5.
明确允许这种事情。带有MIME字词的标题应为76个字符或更少,并在需要时折叠。 RFC822折叠后的标头在第二行和任何其他行后缩进。 RFC2047标头应该只缩进一个空格。第一行中的?=和=?之间的空格。应该禁止输出。
请参阅RFC第12页底部的示例:
encoded form displayed as
---------------------------------------------------------------------
(=?ISO-8859-1?Q?a?= (ab)
=?ISO-8859-1?Q?b?=)
Any amount of linear-space-white between 'encoded-word's,
even if it includes a CRLF followed by one or more SPACEs,
is ignored for the purposes of display.