Question

我写了这段代码：

#include <iostream>

int main()
{
   std::wcout << '\u00E1' << std::endl;
}

但是it outputs 50081 when compiled with GCC 4.8.1。

我可能做错了什么，但我当然不希望输出数字。发生了什么事？

Answer 1

我认为这是g ++中的一个错误。 '\u00E1'的类型为char，但g ++将其视为int。 clang ++做对了。

考虑这个相关的程序（使用重载的type_of函数来检测文字的类型）：

#include <iostream>

const char *type_of(char) { return "char"; }
const char *type_of(int)  { return "int";  }

int main()
{
   std::cout << "type_of('x')  = " << type_of('x') << "\n";
   std::cout << "type_of('xy') = " << type_of('xy') << "\n";           // line 9
   std::cout << "type_of('\u00E1')  = " << type_of('\u00E1') << "\n";  // line 10
   std::cout << "type_of('\u0100')  = " << type_of('\u0100') << "\n";  // line 11
}

当我用g ++ 4.7.2编译它时，我收到了这些警告：

c.cpp:9:47: warning: multi-character character constant [-Wmultichar]
c.cpp:10:52: warning: multi-character character constant [-Wmultichar]
c.cpp:11:52: warning: multi-character character constant [-Wmultichar]

并输出：

type_of('x')  = char
type_of('xy') = int
type_of('á')  = int
type_of('Ā')  = int

使用clang ++ 3.0，我只收到两个警告：

c.cpp:9:47: warning: multi-character character constant [-Wmultichar]
   std::cout << "type_of('xy') = " << type_of('xy') << "\n";
                                              ^
c.cpp:11:52: warning: character unicode escape sequence too long for its type
   std::cout << "type_of('\u0100')  = " << type_of('\u0100') << "\n";

并输出：

type_of('x')  = char
type_of('xy') = int
type_of('á')  = char
type_of('Ā')  = char

字符文字'\u00E1'只有一个 c-char-sequence ，恰好是通用字符名称，因此它的类型为{ {1}}，但g ++错误地将其视为char类型的多字符常量。 clang ++正确地将它视为int类型的普通字符文字。

此值在char范围之外的字符文字的值是实现定义的，但它仍然是char类型。

由于您正在写信char，因此您可能需要一个宽字符文字std::wcout，其类型为L'\u00E1'，而不是char_t，如果您的编译器正确处理它）类型为'\u00E1'。

Answer 2

这似乎是编译器错误。

根据标准（2.14.3 / 1）'\u00E1'是普通字符文字（它没有u，U或L前缀），它包含一个 c-char （它是通用字符名称），因此它具有类型char。

因此std::wcout << '\u00E1'应使用operator<<(char)并打印单个字符。

取而代之的是通用字符名称，将其转换为UTF-8编码序列并获取多字符文字'\ xC3 \ xA1'，这是一个int，值为50081：

'\u00E1' -> '\xC3\xA1' -> 50081

为什么我得到一个数字而不是Unicode字符？

2 个答案: