样本1：

Question

我试图了解printf如何使用宽字符（wchar_t）。

我已经制作了以下代码示例：

样本1：

#include <stdio.h>
#include <stdlib.h>

int     main(void)
{
    wchar_t     *s;

    s = (wchar_t *)malloc(sizeof(wchar_t) * 2);
    s[0] = 42;
    s[1] = 0;
    printf("%ls\n", s);
    free(s);
    return (0);
}

输出

这里的一切都很好：我的角色（*）被正确显示。

样本2：

我想展示另一种角色。在我的系统上，wchar_t似乎编码为4个字节。所以我试着显示以下字符： É

#include <stdio.h>
#include <stdlib.h>

int     main(void)
{
    wchar_t     *s;

    s = (wchar_t *)malloc(sizeof(wchar_t) * 2);
    s[0] = 0xC389;
    s[1] = 0;
    printf("%ls\n", s);
    free(s);
    return (0);
}

但是这次没有输出，我尝试了s[0]（0xC389,201,0xC9）的“编码”部分（参见上一个链接）的许多值...但我从来没有得到{显示{1}}个字符。我还尝试使用É代替%S。

如果我尝试像这样调用printf：%ls打印的唯一字符是printf("<%ls>\n", s)，则显示被截断。

为什么我会遇到这个问题？我该怎么办？

Answer 1

为什么我有这个问题？

请务必检查errno和printf的返回值！

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

int main(void)
{
    wchar_t *s;
    s = (wchar_t *) malloc(sizeof(wchar_t) * 2);
    s[0] = 0xC389;
    s[1] = 0;

    if (printf("%ls\n", s) < 0) {
        perror("printf");
    }

    free(s);
    return (0);
}

见输出：

$ gcc test.c && ./a.out
printf: Invalid or incomplete multibyte or wide character

如何修复

首先，C程序的默认语言环境是C（也称为POSIX），它只是ASCII。您需要添加对setlocale的来电，特别是setlocale(LC_ALL,"")。

如果您的LC_ALL，LC_CTYPE或LANG环境变量未设置为空白时允许UTF-8，则必须明确选择区域设置。 setlocale(LC_ALL, "C.UTF-8")适用于大多数系统 - C是标准的，UTF-8的{{1}}子集通常已实施。

见输出：

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>

int main(void)
{
    wchar_t *s;
    s = (wchar_t *) malloc(sizeof(wchar_t) * 2);
    s[0] = 0xC389;
    s[1] = 0;

    setlocale(LC_ALL, "");

    if (printf("%ls\n", s) < 0) {
        perror("printf");
    }

    free(s);
    return (0);
}

打印出错误字符的原因是$ gcc test.c && ./a.out 쎉表示宽字符（例如UTF-32），而不是多字节字符（例如UTF-8）。请注意，wchar_t在GNU C库中总是32位宽，但C标准并不要求它。如果您使用wchar_t编码（即UTF-32BE）初始化角色，则会正确打印出来：

0x000000C9

输出：

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>

int main(void)
{
    wchar_t *s;
    s = (wchar_t *) malloc(sizeof(wchar_t) * 2);
    s[0] = 0xC9;
    s[1] = 0;

    setlocale(LC_ALL, "");

    if (printf("%ls\n", s) < 0) {
        perror("printf");
    }

    free(s);
    return (0);
}

请注意，您还可以通过命令行设置$ gcc test.c && ./a.out É（语言环境）环境变量：

LC

Answer 2

一个问题是您尝试将UTF-8（单字节编码方案）编码为多字节编码。对于UTF-8，您使用普通char。

另请注意，因为您尝试将UTF-8序列组合成多字节类型，所以endianness（字节顺序）问题（在内存0xC389中可能存储为{{ 1}}和0x89，按此顺序）。和编译器也会对您的号码进行签名扩展（如果0xC3并且您在调试器中查看sizeof(wchar_t) == 4它可能是s[0]）。

另一个问题是您用于打印的终端或控制台。也许它只是不支持UTF-8或你试过的其他编码？

Answer 3

我找到了一种打印宽字符的简单方法。一个关键点是setlocale()

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(int argc, char *argv[])
{
    setlocale(LC_ALL, "");
    // setlocale(LC_ALL, "C.UTF-8"); // this also works

    wchar_t hello_eng[] = L"Hello World!";
    wchar_t hello_china[] = L"世界, 你好!";
    wchar_t *hello_japan = L"こんにちは日本!";
    printf("%ls\n", hello_eng);
    printf("%ls\n", hello_china);
    printf("%ls\n", hello_japan);

    return 0;
}

用printf显示宽字符

样本1：

样本2：

3 个答案:

为什么我有这个问题？

如何修复