我们有一个流程,通过电子邮件正文中的ESMTP将XML传输给我们。电子邮件正文的字符集指定为 ISO-8859-1 ,并且没有为XML指定编码。根据协议,默认值为 UTF-8 。
问题是我们的XML解析器在遇到®字符时会抛出异常,因为它认为它解析 UTF-8 ,而 UTF-8 中的®字符是2字节,而不是 ISO-8859-1 中的1。
以下是带有XML的示例电子邮件正文:
Delivered-To: ...
Received: ...
Received: ...
Return-Path: ...
Received: ...
Received-SPF: ...
Authentication-Results: ...
Received: ...
Thread-Topic: ...
From: ...
To: ...
Subject: ...
Date: ...
Message-ID: ...
MIME-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-Mailer: Microsoft CDO for Windows 2000
Content-Class: urn:content-classes:message
Importance: normal
Priority: normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.4325
<?xml version="1.0"?>
...
<comments>Super Widget®</comments>
...
答案 0 :(得分:1)
XML specification在附录F中有关编码检测的内容:
此外,在许多情况下,除了XML数据流if。之外,还有其他信息来源。
所以是的,由于XML流本身缺少encoding="..."
,你应该依赖外部源,在这种情况下是Content-Type
标题。