c ++从字符串中创建一个unicode char

时间:2015-01-19 20:28:41

标签: c++ string utf-8 char

我有一个像这样的字符串

string s = "0081";

我需要制作一个像这样的一个字符串

string c = "\u0081"  

如何从长度为4的原始字符串中创建长度为1的字符串?

编辑: 我的错误," \ u0081"不是char(1个字节)而是2个字节的字符/字符串? 所以我输入的是二进制,1000 0001,这是0x81,这就是我的字符串" 0081"。 是否更容易从这个0x81转到字符串c =" \ u0081"那个价值是什么? 感谢所有的帮助

2 个答案:

答案 0 :(得分:0)

你走了:

unsigned int x;
std::stringstream ss;
ss << std::hex << "1081";
ss >> x;

wchar_t wc1 = x;
wchar_t wc2 = L'\u1081';

assert(wc1 == wc2);

std::wstring ws(1, wc);

答案 1 :(得分:0)

这是整个过程,基于我在其他地方的评论中链接的一些代码。

string s = "0081";
long codepoint = strtol(s.c_str(), NULL, 16);
string c = CodepointToUTF8(codepoint);

std::string CodepointToUTF8(long codepoint)
{
    std::string out;
    if (codepoint <= 0x7f)
        out.append(1, static_cast<char>(codepoint));
    else if (codepoint <= 0x7ff)
    {
        out.append(1, static_cast<char>(0xc0 | ((codepoint >> 6) & 0x1f)));
        out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
    }
    else if (codepoint <= 0xffff)
    {
        out.append(1, static_cast<char>(0xe0 | ((codepoint >> 12) & 0x0f)));
        out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
        out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
    }
    else
    {
        out.append(1, static_cast<char>(0xf0 | ((codepoint >> 18) & 0x07)));
        out.append(1, static_cast<char>(0x80 | ((codepoint >> 12) & 0x3f)));
        out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
        out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
    }
    return out;
}

请注意,此代码不会执行任何错误检查,因此如果您向其传递无效的代码点,则会返回无效的字符串。