Question

我想知道是否有推荐的'交叉'Windows和Linux方法用于将字符串从UTF-16LE转换为UTF-8？或者每个环境应该使用不同的方法？

我设法谷歌很少引用'iconv'，但是对于somreason，我找不到基本转换的样本，例如 - 将wchar_t UTF-16转换为UTF-8。

任何人都可以推荐一种“交叉”的方法，如果您知道参考文献或带样本的指南，我们将非常感激。

谢谢，Doori Bar

Answer 1

如果您不想使用ICU，

Windows：WideCharToMultiByte
Linux：iconv（Glibc）

Answer 2

使用PowerShell将编码更改为UTF-8：

powershell -Command "Get-Content PATH\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 PATH2\temp.txt"

Answer 3

开源ICU library非常常用。

Answer 4

我也遇到过这个问题，我使用boost locale library

解决了这个问题

try
{           
    std::string utf8 = boost::locale::conv::utf_to_utf<char, short>(
                        (short*)wcontent.c_str(), 
                        (short*)(wcontent.c_str() + wcontent.length()));
    content = boost::locale::conv::from_utf(utf8, "ISO-8859-1");
}
catch (boost::locale::conv::conversion_error e)
{
    std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl;
    break;
}

boost :: locale :: conv :: utf_to_utf 函数尝试从UTF-16LE编码的缓冲区转换为UTF-8， boost :: locale :: conv :: from_utf 函数尝试从UTF-8编码的缓冲区转换为ANSI，确保编码正确（这里我使用Latin-1编码， ISO-8859-1）。

另一个提示是，在Linux中，std :: wstring的长度为4个字节，但在Windows中，std :: wstring的长度为2个字节，因此最好不要使用std :: wstring来包含UTF-16LE缓冲区。

Answer 5

wchar_t *src = ...;
int srclen = ...;
char *dst = ...;
int dstlen = ...;
iconv_t conv = iconv_open("UTF-8", "UTF-16");
iconv(conv, (char*)&src, &srclen, &dst, &dstlen);
iconv_close(conv);

Answer 6

还有utfcpp，这是一个仅限标题的库。

Answer 7

另一种在 UTF-8、UTF-16、UTF-32、wchar 之间转换字符串的可移植 C 可能性 - 是 mdz_unicode 库。

Answer 8

谢谢大家，这就是我设法解决'交叉'窗口和linux要求的方法：

已下载并已安装：MinGW和MSYS
下载libiconv源包
通过libiconv编译MSYS。

就是这样。

Answer 9

如果已安装MSYS2，则iconv软件包（默认情况下已安装）可用于：

 iconv -f utf-16le -t utf-8 <input.txt >output.txt

在Windows和Linux下，在C中将UTF-16转换为UTF-8

9 个答案: