Question

我有sqlite格式错误的表格，其中包含一些unicode格式的字符串，并且它们会在提示命令行中显示。

public class Store {
public static ArrayList<SKU> mySkuArrayList = new ArrayList<SKU>();

public void addSKU(SKU sku) {
    mySkuArrayList.add(sku);
}

但事实上，它代表了一个普通的＆＃34;英语＆＃34; ascii中的字符串。

但是，这个数据实际上是基于ascii的常规字符串。所以为了进行转换，我使用python来查看unicode构建块

sqlite> select * from myTable;
1|条湥却慴畴s|2

在第二阶段，我将其转换为ascii格式。但是，它仍然代表了长格式。

>>> str="条湥却慴畴s"
>>> str
'\xe6\x9d\xa1\xe6\xb9\xa5\xe5\x8d\xb4\xe6\x85\xb4\xe7\x95\xb4s'

但是如果我采用成对并将其转换为ascii表示，我会得到原始字符串（除了尾随＆＃39;我不是确定它代表什么......）

>>> str2 = unicode(str,"utf8")
>>> str2
u'\u6761\u6e65\u5374\u6174\u7574s'

有没有办法在c ++或objective-c中以编程方式执行所有这些迭代？

感谢

Answer 1

理想情况下，数据应该以UTF-8格式存储，但您的输入似乎是UTF16-LE格式，可能是由Windows程序创建的。你必须转换为UTF8

在ASCII或UTF8中，'a'表示为单个字节0x65
在UTF16-LE中，'a'是2个字节0x65 0x00

最后的s可能是由于未定义的行为，因为字符串不是以null结尾。

请注意，常规c-string最后只有一个零。但是UTF16最后需要2个零。

#include <iostream>
#include <string>
#include <codecvt>
#include <cassert>
#include <locale>
#include <string.h>

int main()
{
    char buf[] = "\x67\0\x61\0\x6e\0\x65\0\x53\0\x74\0\x61\0\x74\0\x75\0\x73\0\0\0";
    std::string trans = 
        std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, 
        char16_t>{}.to_bytes((char16_t*)buf);
    std::cout << trans << "\n";
    return 0;
}

（如果这是在Windows平台上，您将使用wchar_t而不是char16_t

See online example

将sqlite db中的unicode文本字段转换为其本机ascii表示形式

1 个答案: