Question

嘿嘿，我刚开始尝试学习Java并遇到了令人困惑的事情！

我正在输入我正在使用的书中的一个例子。这是为了证明 char数据类型。

代码如下：

public class CharDemo
{
public static void main(String [] args)
{
char a = 'A';
char b = (char) (a + 1);
System.out.println(a + b);
System.out.println("a + b is " + a + b);
int x = 75;
char y = (char) x;
char half = '\u00AB';
System.out.println("y is " + y + " and half is " + half);
}
}

令我困惑的是声明，char half ='\ u00AB'。该书指出\ u00AB是符号“1/2”的代码。如上所述，当我从cmd编译并运行程序时，在此行上生成的符号实际上是'1/2'。

所以一切似乎都在按预期工作。我决定玩代码并尝试一些不同的unicodes。我搜索了多个unicode表，发现它们都不符合上述结果。

在每一个中我都发现它声明代码/ u00AB不是'1/2'，实际上就是这个：

http://www.fileformat.info/info/unic...r/ab/index.htm 那么Java使用的是什么字符集，我认为UNicode应该就是那个，Uni，只有一个。我搜索了几个小时，无处可以找到状态/ u00AB等于1/2的字符集，但这是我的java编译器将其解释为。

我必须在这里遗漏一些明显的东西！谢谢你的帮助！

Answer 1

Windows平台上的控制台编码不匹配是一个众所周知的问题。

Java Runtime期望系统控制台使用的编码与系统默认编码相同。但是，Windows使用两种单独的编码：ANSI code page (system default encoding) and OEM code page (console encoding)。

因此，当您尝试将Unicode字符U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK写入控制台时，Java运行时期望控制台编码是ANSI编码（在您的情况下为Windows-1252），其中表示此Unicode字符为0xAB。但是，实际的控制台编码是OEM编码（在您的情况下为CP437），其中0xAB表示½。

因此，使用System.out.println()将数据打印到Windows控制台会产生错误的结果。

要获得正确的结果，您可以改为使用System.console().writer().println()。

Answer 2

\u00ab字符不是1/2字符;请参阅Unicode.org网站上的definitive code page。

您所看到的是（我认为）在默认字符编码不是UTF-8或Latin-1的平台上使用System.out PrintStream的结果。也许这是@ axtavt的回答所建议的一些Windows字符集？（它也有一个合理的解释，为什么\u00ab显示为1/2 ...而不是一些“splat”字符。）

（在Unicode和Latin-1中，\00BD是1/2字符的代码点。）

Answer 3

0xAB是旧的Codepage 437的1/2，这是Windows终端默认使用的no matter what codepage you actually set。

因此，事实上，char值表示Java程序的“«”字符，如果您在GUI中呈现该char或在理智的操作系统上运行它，您将获得该字符。如果您想在Windows中查看正确的输出，请将CMD中的字体设置从“光栅字体”切换出来（单击左上角的图标，属性，字体选项卡）。例如，使用Lucida Console，我可以这样做：

C:\Users\Documents>java CharDemo
131
a + b is AB
y is K and half is ½    

C:\Users\Documents>chcp 1252
Active code page: 1252

C:\Users\Documents>java CharDemo
131
a + b is AB
y is K and half is «

C:\Users\Documents>chcp 437
Active code page: 437

Answer 4

Java的一个好处是它基于unicode。这意味着，您可以使用不是英文字母（例如中文或数学符号）的系统编写字符，不仅可以使用数据字符串，还可以使用函数和变量名称。

这是在类名和变量名中使用unicode字符的示例代码。

class 方 {
    String 北 = "north";
    double π = 3.14159;
}

class UnicodeTest {
    public static void main(String[] arg) {
        方 x1 = new 方();
        System.out.println( x1.北 );
        System.out.println( x1.π );
    }
}

Java是在Unicode标准具有为更小的字符集定义的值的时候创建的。当时人们认为16位将足以编码所有需要的字符。考虑到这一点，Java被设计为使用UTF-16。实际上，char数据类型最初用于表示16位Unicode代码点。

UTF-8字符集由RFC 2279指定;

UTF-16字符集由RFC 2781

指定

UTF-16字符集使用16位数量，因此对字节顺序敏感。在这些编码中，流的字节顺序可以由Unicode字符'\ uFEFF'表示的初始字节顺序标记指示。字节顺序标记按如下方式处理：

When decoding, the UTF-16BE and UTF-16LE charsets ignore byte-order marks; when encoding, they do not write byte-order marks.

When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.

Also see this

Answer 5

好吧，当我使用该代码时，我得到＆lt;＆lt;我应该和\u00BD应该是1/2。

http://www.unicode.org/charts/

Java Unicode混淆

5 个答案: