Question

我正在使用JavaMail 1.4.1从电子邮件帐户中读取消息（我已升级到1.4.5版本，但问题相同），但我对内容编码存在问题：< / p>

POP3Message pop3message;
... 
Object contentObject = pop3message.getContent();
...   
String contentType = pop3message.getContentType();
String content = contentObject.toString();

某些邮件已正确读取，但由于编码不合适，其他邮件会有奇怪的字符。我意识到它不适用于特定的内容类型。

如果contentType是以下任何一个，它的效果很好：

文本/无格式;字符集= ISO-8859-1



文本/无格式;
  字符集= “ISO-8859-1”



文本/无格式;
  字符集= “ISO-8859-1”;
  格式= “流动”



文本/无格式;字符集=窗口1252

但如果是，则不会：

文本/无格式;
  字符集= “UTF-8”

如果我尝试获取编码（pop3message.getEncoding（）），那么

这个contentType（UTF-8一个）

引用可打印

对于后一种编码，我在调试器中得到了String值（就像我在持久化对象后在数据库中看到的那样）：

UbicaciÃ³n（而不是Ubicación）

但是，如果我在浏览器中使用电子邮件客户端打开电子邮件，它可以毫无问题地阅读，这是一个正常的消息（没有附件，只是文本），所以消息似乎没问题。

有关如何解决此问题的任何想法？

感谢。

更新这是我为了尝试jlordo给出的函数getUTF8Content（）而添加的代码

POP3Message pop3message = (POP3Message) message;
String uid = pop3folder.getUID(message);

//START JUST FOR TESTING PURPOSES
if(uid.trim().equals("1401")){
    Object utfContent = pop3message.getContent();
    System.out.println(utfContent.getClass().getName()); // it is of type String
    //System.out.println(utfContent); // if not commmented it prints the content of one of the emails I'm having problems with.
    System.out.println(pop3message.getEncoding()); //prints: quoted-printable
    System.out.println(pop3message.getContentType()); //prints: text/plain; charset="utf-8"
    String utfContentString = getUTF8Content(utfContent); // throws java.lang.ClassCastException: java.lang.String cannot be cast to javax.mail.util.SharedByteArrayInputStream
    System.out.println(utfContentString);
}

//END TEST CODE

Answer 1

你如何发现这些消息有“奇怪的字符”？你在某处显示数据吗？您用于显示数据的任何方法都可能无法正确处理Unicode字符。

第一步是确定问题是您收到了错误的字符，还是错误地显示了正确的字符。您可以检查数据中每个字符的Unicode值（例如，在从getContent方法返回的String中），以确保每个字符都具有正确的Unicode值。如果是这样，问题在于您用于显示字符的方法。

Answer 2

尝试这个，让我知道它是否有效：

if ( *check if utf 8 here* ) {
    content = getUTF8Content(contentObject);
}

// TODO take care of UnsupportedEncodingException, 
// IOException and ClassCastException
public static String getUTF8Content(Object contentObject) {
    // possible ClassCastException
    SharedByteArrayInputStream sbais = (SharedByteArrayInputStream) contentObject;
    // possible UnsupportedEncodingException
    InputStreamReader isr = new InputStreamReader(sbais, Charset.forName("UTF-8"));
    int charsRead = 0;
    StringBuilder content = new StringBuilder();
    int bufferSize = 1024;
    char[] buffer = new char[bufferSize];
    // possible IOException
    while ((charsRead = isr.read(buffer)) != -1) {
        content.append(Arrays.copyOf(buffer, charsRead));
    }
    return content.toString();
}

BTW，JavaMail 1.4.1是一个要求吗？最新版本是1.4.5。

Answer 3

对我来说有用的是我打电话给getContentType()并且我会检查字符串是否包含＆＃34; utf＆＃34;在其中（定义用作UTF之一的字符集）。

如果是，我会在这种情况下区别对待内容。

private String encodeCorrectly(InputStream is) {
    java.util.Scanner s = new java.util.Scanner(is, StandardCharsets.UTF_8.toString()).useDelimiter("\\A");
    return s.hasNext() ? s.next() : "";
}

（从this answer on SO修改IS到String转换器）

这里重要的部分是使用正确的Charset。这解决了我的问题。

Answer 4

首先，您必须按照以下方式根据UTF-8编码添加标头：

...
MimeMessage msg = new MimeMessage(session);
msg.setHeader("Content-Type", "text/html; charset=UTF-8");
msg.setHeader("Content-Transfer-Encoding", "8bit");

msg.setFrom(new InternetAddress(doConversion(from)));
msg.setRecipients(javax.mail.Message.RecipientType.TO, address);
msg.setSubject(asunto, "UTF-8");

MimeBodyPart mbp1 = new MimeBodyPart();
mbp1.setContent(text, "text/html; charset=UTF-8");
Multipart mp = new MimeMultipart();
mp.addBodyPart(mbp1);
...

但是对于'from'标头，我使用以下方法转换字符：

public String doConversion(String original) {
    if(original == null) return null;
    String converted = original.replaceAll("á", "\u00c3\u00a1");
    converted = converted.replaceAll("Á", "\u00c3\u0081");
    converted = converted.replaceAll("é", "\u00c3\u00a9");
    converted = converted.replaceAll("É", "\u00c3\u0089");
    converted = converted.replaceAll("í", "\u00c3\u00ad");
    converted = converted.replaceAll("Í", "\u00c3\u008d");
    converted = converted.replaceAll("ó", "\u00c3\u00b3");
    converted = converted.replaceAll("Ó", "\u00c3\u0093");
    converted = converted.replaceAll("ú", "\u00c3\u00ba");
    converted = converted.replaceAll("Ú", "\u00c3\u009a");
    converted = converted.replaceAll("ñ", "\u00c3\u00b1");
    converted = converted.replaceAll("Ñ", "\u00c3\u0091");
    converted = converted.replaceAll("€", "\u00c2\u0080");
    converted = converted.replaceAll("¿", "\u00c2\u00bf");
    converted = converted.replaceAll("ª", "\u00c2\u00aa");
    converted = converted.replaceAll("º", "\u00c2\u00b0");
    return converted;
}

如果需要添加一些其他字符，则可以在http://www.fileformat.info/info/charset/UTF-8/list.htm处以UTF格式查看对应的UTF-8十六进制编码。

使用JavaMail读取电子邮件内容时编码问题

4 个答案: