Question

我们无法将Unicode字符串转换为UTF-8字符串以通过网络发送：

// Start with our unicode string.
string unicode = "Convert: \u10A0";

// Get an array of bytes representing the unicode string, two for each character.
byte[] source = Encoding.Unicode.GetBytes(unicode);

// Convert the Unicode bytes to UTF-8 representation.
byte[] converted = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, source);

// Now that we have converted the bytes, save them to a new string.
string utf8 = Encoding.UTF8.GetString(converted);

// Send the converted string using a Microsoft function.
MicrosoftFunc(utf8);

虽然我们已经将字符串转换为UTF-8，但它并没有以UTF-8的形式到达。

Answer 1

在经历了一个令人困惑和困惑的早晨之后，我们找到了这个问题的答案。

我们遗漏的关键点是，这让人非常困惑，因为字符串类型总是以16位（2字节）Unicode 编码。这意味着当我们对字节执行GetString（）时，它们会在幕后自动重新编码为Unicode ，我们并没有比我们在第一时间更好。

当我们开始出现字符错误和另一端的双字节数据时，我们知道出现了问题但是我们看到的代码一目了然，我们看不出有什么问题。在了解了上面解释的内容之后，我们意识到如果我们想保留编码，我们需要发送字节数组。幸运的是，MicrosoftFunc（）有一个重载，它能够采用字节数组而不是字符串。这意味着我们可以将unicode字符串转换为我们选择的编码，然后将其完全按照我们的预期发送出去。代码更改为：

// Convert from a Unicode string to an array of bytes (encoded as UTF8).
byte[] source = Encoding.UTF8.GetBytes(unicode); 

// Send the encoded byte array directly! Do not send as a Unicode string.
MicrosoftFunc(source);

要点：

总而言之，从上面我们可以看出：

GetBytes（）除此之外，来自Unicode 的Encoding.Convert（）（因为字符串总是Unicode）以及从中调用函数的指定编码返回一个编码字节数组。
GetString（）等等，从指定的编码调用函数的Encoding.Convert（）到Unicode （因为字符串总是Unicode）和将其作为字符串对象返回。
Convert（）实际上将一个编码的字节数组转换为另一个编码的另一个字节数组。显然字符串不能用于（因为字符串总是Unicode）。

在C＃字符串/字符编码中GetBytes（），GetString（）和Convert（）之间的区别是什么？

1 个答案:

要点：