UTF8 Byte to String& Winsock GetStream

时间:2015-02-08 23:17:04

标签: c# utf-8 winsock utf8-decode

好吧,我正在尝试以字节为单位转换大字节信息。 (11076长)

问题最终,信息缺少字符。 (长度10996)

查找

enter image description here

Winsock连接收到信息,查看过程:

    public static void UpdateClient(UserConnection client)
    {
        string data = null;
        Decoder utf8Decoder = Encoding.UTF8.GetDecoder();

            Console.WriteLine("Iniciando");
            byte[] buffer = ReadFully(client.TCPClient.GetStream(), 0);
            int charCount = utf8Decoder.GetCharCount(buffer, 0, buffer.Length);
            Char[] chars = new Char[charCount];
            int charsDecodedCount = utf8Decoder.GetChars(buffer, 0, buffer.Length, chars, 0);

            foreach (Char c in chars)
            {
                data = data + String.Format("{0}", c);
            }

            int buffersize = buffer.Length;
            Console.WriteLine("Chars is: " + chars.Length);
            Console.WriteLine("Data is: " + data);
            Console.WriteLine("Byte is: " + buffer.Length);
            Console.WriteLine("Size is: " + data.Length);
            Server.Network.ReceiveData.SelectPacket(client.Index, data);
    }

    public static byte[] ReadFully(Stream stream, int initialLength)
    {
        if (initialLength < 1)
        {
            initialLength = 32768;
        }

        byte[] buffer = new byte[initialLength];
        int read = 0;

        int chunk;

        chunk = stream.Read(buffer, read, buffer.Length - read);

        checkreach:
            read += chunk;

            if (read == buffer.Length)
            {
                int nextByte = stream.ReadByte();

                if (nextByte == -1)
                {
                    return buffer;
                }

                byte[] newBuffer = new byte[buffer.Length * 2];
                Array.Copy(buffer, newBuffer, buffer.Length);
                newBuffer[read] = (byte)nextByte;
                buffer = newBuffer;
                read++;
                goto checkreach;
            }

        byte[] ret = new byte[read];
        Array.Copy(buffer, ret, read);
        return ret;
    }

任何人都有提示或解决方案吗?

1 个答案:

答案 0 :(得分:0)

UTF-8编码文本比字符数更多的字节是完全正常的。在UTF-8中,一些字符(例如áã)被编码为两个或更多字节。

如果您尝试使用它来读取超过初始缓冲区的值,或者如果它无法通过一次ReadFully调用读取整个流,那么Read方法会返回垃圾不应该使用它。 char数组转换为字符串的方式也非常慢。只需使用StreamReader读取流并将其解码为字符串:

public static void UpdateClient(UserConnection client) {
  string data;
  using (StreamReader reader = new StreamReader(client.TCPClient.GetStream(), Encoding.UTF8)) {
    data = reader.ReadToEnd();
  }
  Console.WriteLine("Data is: " + data);
  Console.WriteLine("Size is: " + data.Length);
  Server.Network.ReceiveData.SelectPacket(client.Index, data);
}