阅读unicode

时间:2010-04-20 04:27:26

标签: java unicode

我正在使用java io从可能输出é等字符的服务器中检索文本。 然后使用System.err输出它们,结果是'?'。我正在使用UTF8编码。怎么了? int len = 0;

char[] buffer = new char[1024];
OutputStream os = sock.getOutputStream();
InputStream is = sock.getInputStream();
os.write(query.getBytes("UTF8"));//iso8859_1"));

Reader reader = new InputStreamReader(is, Charset.forName("UTF-8"));
do {
    len = reader.read(buffer);
    if (len > 0) {
        if (outstring == null) {
            outstring = new StringBuffer();
        }
        outstring.append(buffer, 0, len);
    }
} while (len > 0);
System.err.println(outstring);

编辑:刚试过以下代码:

StringBuffer b = new StringBuffer();
for (char c = 'a'; c < 'd'; c++) {
    b.append(c);
}
b.append('\u00a5'); // Japanese Yen symbol
b.append('\u01FC'); // Roman AE with acute accent
b.append('\u0391'); // GREEK Capital Alpha
b.append('\u03A9'); // GREEK Capital Omega

for (int i = 0; i < b.length(); i++) {
    System.out.println("Character #" + i + " is " + b.charAt(i));
}
System.out.println("Accumulated characters are " + b);

也出现了垃圾:

Character #0 is a
Character #1 is b
Character #2 is c
Character #3 is ¥
Character #4 is ?
Character #5 is ?
Character #6 is ?
Accumulated characters are abc¥???

3 个答案:

答案 0 :(得分:2)

首先,验证系统属性(file.encoding)实际上是UTF8。如果是,那么您的问题不是您正在运行的代码,但您的终端程序(或其他输出显示)无法正确呈现输出。

答案 1 :(得分:0)

将其写入文件并检查它是如何发生的。如果它在文件中正确出现,则表示您的错误流存在问题(编码不是UTF-8)。如果还有,因为你的服务器编码中的垃圾字符可能不是UTF-8。

答案 2 :(得分:0)

您的第二个示例为我生成以下输出。

Character #0 is a
Character #1 is b
Character #2 is c
Character #3 is ¥
Character #4 is Ǽ
Character #5 is Α
Character #6 is Ω
Accumulated characters are abc¥ǼΑΩ

此代码生成具有相同内容的正确编码的UTF-8文件。

StringBuilder b = new StringBuilder();
for (char c = 'a'; c < 'd'; c++) {
    b.append(c);
}
b.append('\u00a5'); // Japanese Yen symbol
b.append('\u01FC'); // Roman AE with acute accent
b.append('\u0391'); // GREEK Capital Alpha
b.append('\u03A9'); // GREEK Capital Omega

PrintStream out = new PrintStream("temp.txt", "UTF-8");
for (int i = 0; i < b.length(); i++) {
    out.println("Character #" + i + " is " + b.charAt(i));
}
out.println("Accumulated characters are " + b);

另请参阅:The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)