Question

我需要限制使用UTF-8编码编码的输出byte[]长度。例如。 byte[]长度必须小于或等于1000首先，我编写了以下代码

            int maxValue = 1000;

            if (text.Length > maxValue)
                text = text.Substring(0, maxValue);
            var textInBytes = Encoding.UTF8.GetBytes(text);

如果字符串只使用ASCII字符，那么

效果很好，因为每个字符有1个字节。但是如果字符超出了它，那么每个字符可能是2或3甚至6个字节。这将是上述代码的问题。所以为了解决这个问题，我写了这个。

            List<byte> textInBytesList = new List<byte>();
            char[] textInChars = text.ToCharArray();
            for (int a = 0; a < textInChars.Length; a++)
            {
                byte[] valueInBytes = Encoding.UTF8.GetBytes(textInChars, a, 1);
                if ((textInBytesList.Count + valueInBytes.Length) > maxValue)
                    break;

                textInBytesList.AddRange(valueInBytes);
            }

我没有测试过代码，但我相信它会按照我的意愿运行。但是，我不喜欢它的方式，有没有更好的方法来做到这一点？我错过了什么？还是不知道？

谢谢。

Answer 1

我首次在Stack Overflow上发帖，所以要温柔！这种方法应该很快为你解决问题..

    public static byte[] GetBytes(string text, int maxArraySize, Encoding encoding) {
        if (string.IsNullOrEmpty(text)) return null;            

        int tail = Math.Min(text.Length, maxArraySize);
        int size = encoding.GetByteCount(text.Substring(0, tail));
        while (tail >= 0 && size > maxArraySize) {
            size -= encoding.GetByteCount(text.Substring(tail - 1, 1));
            --tail;
        }

        return encoding.GetBytes(text.Substring(0, tail));
    }

它与您正在执行的操作类似，但没有列表的额外开销或每次都必须从字符串的开头计算。我从字符串的另一端开始，当然，假设所有字符必须至少为一个字节。因此，开始迭代字符串比maxArraySize（或字符串的总长度）更进一步没有任何意义。

然后你就可以这样调用这个方法..

        byte[] bytes = GetBytes(text, 1000, Encoding.UTF8);

限制字符串的UTF-8编码字节长度

1 个答案: