Question

我遇到了Utf7Encoding类截断'+4'序列的问题。我很想知道为什么会这样。我尝试使用Utf8Encoding从byte []数组中获取字符串，它似乎工作得很好。 Utf8是否存在类似的已知问题？基本上我使用此转换产生的输出来构造rtf字符串中的html。

以下是摘录：

    UTF7Encoding utf = new UTF7Encoding(); 
    UTF8Encoding utf8 = new UTF8Encoding(); 

    string test = "blah blah 9+4"; 

    char[] chars = test.ToCharArray(); 
    byte[] charBytes = new byte[chars.Length]; 

    for (int i = 0; i < chars.Length; i++) 
    { 

        charBytes[i] = (byte)chars[i]; 

     }


    string resultString = utf8.GetString(charBytes); 
    string resultStringWrong = utf.GetString(charBytes); 

    Console.WriteLine(resultString);  //blah blah 9+4  
    Console.WriteLine(resultStringWrong);  //blah 9

Answer 1

您没有正确地将字符串转换为utf7字节。您应该调用utf.GetBytes()而不是将字符转换为字节。

我怀疑在utf7中，对应于'+'的ascii代码实际上是为编码国际unicode字符而保留的。

Answer 2

通过char数组转换为字节数组，因为它不起作用。如果您希望字符串作为特定于字符集的byte[]执行此操作：

UTF7Encoding utf = new UTF7Encoding();
UTF8Encoding utf8 = new UTF8Encoding();

string test = "blah blah 9+4";

byte[] utfBytes = utf.GetBytes(test);
byte[] utf8Bytes = utf8.GetBytes(test);

string utfString = utf.GetString(utfBytes);
string utf8String = utf8.GetString(utf8Bytes);

Console.WriteLine(utfString);  
Console.WriteLine(utf8String);

输出：

blat blah 9 + 4
blat blah 9 + 4

Utf7Encoding文本截断

2 个答案: