Question

我需要从UTF-8的字节序列中读取一个字符串。这些字节的来源是在单独的读取操作中进行的，这些操作不会考虑字符边界，所以我不能使用System.Text.Encoding.UTF8.GetString。但是，System.Text.Encoding.UTF8.GetDecoder（）返回的System.Text.Decoder类似乎是为此方案设计的。其中一个OUT参数看起来应该指示何时只能部分读取一个字符。

Convert（at https://msdn.microsoft.com/en-us/library/h6w985hz(v=vs.110).aspx）的文档表明，如果输出（char []）缓冲区太小，或者不能转换所有字节，则完成的值应为false。见备注第4段。

但是，即使文档说它应该为假，当字符的字节尚未完全转换时，完成的值似乎为TRUE。

我认为我做错了（或者这是一个错误？），如果是这样，我如何检测我的字节流是否在角色中间暂停？

演示代码：

const int outSize = 10;
char[] outBuf = new char[outSize];
byte[] frag1 = new byte[] { 0xE7 };
byte[] frag2 = new byte[] { 0x95, 0xA2 };
var decoder = System.Text.Encoding.UTF8.GetDecoder();
int bytesUsed, charsUsed; bool completed;

// the first byte of the UTF-8 character
decoder.Convert(frag1, 0, frag1.Length, outBuf, 0, outSize, false, out bytesUsed, out charsUsed, out completed);
Debug.Assert( bytesUsed == 1 );
Debug.Assert( charsUsed == 0 );

// // // // // // // // // // // //  completed is true here, but WHY ?
Debug.Assert( ! completed);
// // // // // // // // // // // // 

// the second and third bytes of the UTF-8 character
decoder.Convert(frag2, 0, frag2.Length, outBuf, 0, outSize, false, out bytesUsed, out charsUsed, out completed);
Debug.Assert(bytesUsed == 2);
Debug.Assert(charsUsed == 1);
Debug.Assert(completed);
Debug.Assert( new String(outBuf, 0, 1 ) == "畢" );

谢谢！

.NET System.Text.Decoder.Convert方法在字符中间返回completed == true

0 个答案: