我有quoted-printables的文字。以下是此类文本的示例(来自wikipedia article):
如果你相信真相= 3Dbeauty,那么肯定= 20 =
数学是哲学中最美丽的分支。
我正在寻找一个Java类,它将编码形式解码为chars,例如 = 20 到空格。
更新:感谢The Elite Gentleman,我知道我需要使用QuotedPrintableCodec:
import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.net.QuotedPrintableCodec;
import org.junit.Test;
public class QuotedPrintableCodecTest {
private static final String TXT = "If you believe that truth=3Dbeauty, then surely=20=mathematics is the most beautiful branch of philosophy.";
@Test
public void processSimpleText() throws DecoderException
{
QuotedPrintableCodec.decodeQuotedPrintable( TXT.getBytes() );
}
}
但是我一直得到以下例外:
org.apache.commons.codec.DecoderException: Invalid URL encoding: not a valid digit (radix 16): 109
at org.apache.commons.codec.net.Utils.digit16(Utils.java:44)
at org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(QuotedPrintableCodec.java:186)
我做错了什么?
更新2:我找到this question @ SO并了解MimeUtility:
import javax.mail.MessagingException;
import javax.mail.internet.MimeUtility;
public class QuotedPrintableCodecTest {
private static final String TXT = "If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.";
@Test
public void processSimpleText() throws MessagingException, IOException
{
InputStream is = new ByteArrayInputStream(TXT.getBytes());
BufferedReader br = new BufferedReader ( new InputStreamReader( MimeUtility.decode(is, "quoted-printable") ));
StringWriter writer = new StringWriter();
String line;
while( (line = br.readLine() ) != null )
{
writer.append(line);
}
System.out.println("INPUT: " + TXT);
System.out.println("OUTPUT: " + writer.toString() );
}
}
然而输出仍然不完美,它包含'=':
INPUT: If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.
OUTPUT: If you believe that truth=beauty, then surely = mathematics is the most beautiful branch of philosophy.
现在我做错了什么?
答案 0 :(得分:9)
Apache Commons Codec QuotedPrintableCodec类确实是RFC 1521 Quoted-Printable部分的实现。
更新,您的quoted-printable字符串是错误的,因为维基百科上的示例使用了Soft-line break。
软线休息:
Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES
that encoded lines be no more than 76 characters long. If longer
lines are to be encoded with the Quoted-Printable encoding, 'soft'
line breaks must be used. An equal sign as the last character on a
encoded line indicates such a non-significant ('soft') line break
in the encoded text. Thus if the "raw" form of the line is a
single unencoded line that says:
Now's the time for all folk to come to the aid of
their country.
This can be represented, in the Quoted-Printable encoding, as
Now's the time =
for all folk to come=
to the aid of their country.
This provides a mechanism with which long lines are encoded in
such a way as to be restored by the user agent. The 76 character
limit does not count the trailing CRLF, but counts all other
characters, including any equal signs.
所以你的文字应该如下:
private static final String CRLF = "\r\n";
private static final String S = "If you believe that truth=3Dbeauty, then surely=20=" + CRLF + "mathematics is the most beautiful branch of philosophy.";
Javadoc明确指出:
引用的可打印规范的规则#3,#4和#5未实现 但是因为完整的引用可打印规范不适合自己 很好地进入面向字节[]的编解码器框架。完成编解码器一次 可运行的编解码器框架已准备就绪。背后的动机 以部分形式提供编解码器是因为它已经可以进入 对于那些不需要引用可打印行的应用程序非常方便 格式化(规则#3,#4,#5),例如Q codec。
Apache QuotedPrintableCodec有一个bug logged,因为它不支持软换行符。