Question

我有一个C程序，目前读入中文文本并将其存储为type wchar_t。我想要做的是在文本中查找特定字符，但我不确定如何引用代码中的字符。

我基本上想说：

wchar_t character;

if (character == 个) {
    return 1;
}

else return 0;

显然，有些逻辑被省略了。我怎样才能在C语言中对中文执行这样的逻辑？

编辑：让它发挥作用。此代码使用-std = c99编译，并打印出字符“个”。

1 #include <locale.h>
2 #include <stdio.h>
3 #include <wchar.h>
4 
5 
6 int main() {
7         wchar_t test[] = L"\u4E2A";
8         setlocale(LC_ALL, "");
9         printf("%ls", test);
10 }

Answer 1

根据您的编译器，如果它允许支持Unicode编码的源，您可以只与实际符号进行比较，否则，您可以使用宽字符常量：

#include <stdio.h>

int main()
{
    int i;
    wchar_t chinese[] = L"我不是中国人。";
    for(i = 0; chinese[i]; ++i)
    {
        if(chinese[i] == L'不')
            printf("found\n");
        if(chinese[i] == L'\u4E0D')
            printf("also found\n");
    }
}

请注意，宽字符字符串为L"xxx"，宽字符为L'x'。可以使用\uXXXX指定Unicode BMP代码点。

仅供参考，我使用Visual Stdio 2012编译了带有BOM，UTF-16（小端）和UTF-16（大端）的UTF-8源代码。没有BOM的UTF-8无效。

如何在C代码中引用中文字符

1 个答案: