Question

我对D中的UTF8字符串感到困惑。有人可以解释为什么下面的代码给出了不同的结果？为什么"abç"[2] == 'ç'是false而不是true？

string s = "abç";
for(int i = 0; i < s.length; i++)
{
    dchar c = s[i];
    writefln("%#x", cast(int)c);
}
writeln();
foreach(dchar c; s)
{
    writefln("%#x", cast(int)c);
}

此代码输出：

enter image description here

Answer 1

ç字符的UNICODE代码点大于7F（是E7）因此在UTF8字符串中表示为多个char（是C3 A7对）

s[2]只是char中的第3个s（以及'ç'的第一个字符）

您的第一个循环按原样打印“字节”。（作为s [i]） Yout第二个循环以UTF32转换s中的代码点。

e7和C3 A7只是相同（U + 00E7）字符的UTF32和UTF8编码。

供参考：http://www.fileformat.info/info/unicode/char/e7/index.htm

从字符串打印字符会产生不同的结果

1 个答案: