Question

我的代码在这里：

std::string st = "名前hlong"; 
for (int i = 0; i < st.lenght(); i++) 
{ 
   char *ch = st[i];
   if ((int)ch <= 255))
   { 
     //Character is latin. 
   } 
   else 
   { 
     //Character is japanese 
   } 
}

我想计算日语和英语字符的数量。但它不起作用。请帮我解决这个问题。谢谢大家。

Answer 1

实际上，您不应该使用std :: string，因为std :: string是面向字节的，而日语字符不能表示为单个字节。您应该使用std :: wstring（或者在C ++ 11 std::u16string和std::u32string中使用UTF-16和UTF-32）。

考虑以下C ++ 11代码：

#include <string>
#include <iostream>
#include <iomanip>

using namespace std;

int main(void) {
        wstring s = L"Привет , 名前 hlong";
        for(wchar_t c: s)
               cout << "Char code = 0x" << hex << int(c) << endl;
        return 0;
}

使用GCC-4.7编译如下g++ -finput-charset=utf-8 -std=c++11 test_wstring.cc -o test_wstring并生成以下输出（0x20代表空格字符）：

Char code = 0x41f
Char code = 0x440
Char code = 0x438
Char code = 0x432
Char code = 0x435
Char code = 0x442
Char code = 0x20
Char code = 0x2c
Char code = 0x20
Char code = 0x540d
Char code = 0x524d
Char code = 0x20
Char code = 0x68
Char code = 0x6c
Char code = 0x6f
Char code = 0x6e
Char code = 0x67

如您所见，标准ASCII字符在0-0xFF范围内，西里尔字符为0x400 +，日语字符为0x524d和0x540d。你应该检查评论中提到的Unicode表，看看你感兴趣的范围。你也可以考虑std :: codecvt设施＆amp;要在字节和面向字符的编码之间进行转换，请参阅http://en.cppreference.com/w/cpp/locale/codecvt

如何检查日语或英语字符

1 个答案: