如何将utf8字节数组转换为给定长度的字符串

时间:2017-11-17 14:05:24

标签: c# string utf-8 byte

假设我有一个字节数组:

var myArr = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87 };

所以它有6个元素,而它对应于utf8 abąć,它有4个字母。通常你做

Encoding.UTF8.GetString(myArr);

将其转换为字符串。但是我们假设myArr实际上更大(最后有更多的字节)但我知道(转换的先验)我只想要前4个字母。如何有效地将此数组转换为字符串?另外,最好让myArr数组中的最后一个字节的索引(对应于转换后的字符串的结尾)。

示例:

// 3 more bytes at the end of formerly defined myArr
var myArr = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87, 0x01, 0x02, 0x03 };
var str = MyConvert(myArr, 4); // read 4 utf8 letters
// str is "abąć"
// possibly I want to know that MyConvert stoped at the index 6 in myArr

生成的string str对象应该有str.Length == 4

1 个答案:

答案 0 :(得分:3)

Decoder看起来像你的背,特别是有点巨大的Convert方法。我想你想要:

var decoder = Encoding.UTF8.GetDecoder();
var chars = new char[4];
decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
    true, out int bytesUsed, out int charsUsed, out bool completed);

使用您问题中的数据完成示例:

using System;
using System.Text;

public class Test
{
    static void Main()
    {
        var bytes = new byte[] { 0x61, 0x62, 0xc4, 0x85, 0xc4, 0x87, 0x01, 0x02, 0x03 };
        var decoder = Encoding.UTF8.GetDecoder();
        var chars = new char[4];
        decoder.Convert(bytes, 0, bytes.Length, chars, 0, chars.Length,
            true, out int bytesUsed, out int charsUsed, out bool completed);
        Console.WriteLine($"Completed: {completed}");
        Console.WriteLine($"Bytes used: {bytesUsed}");
        Console.WriteLine($"Chars used: {charsUsed}");
        Console.WriteLine($"Text: {new string(chars, 0, charsUsed)}");
    }
}