更改非常大的流的编码

时间:2016-02-12 17:20:35

标签: c# character-encoding

我在更改流的编码时收到OutOfMemoryException。到目前为止,流已经相对较小(小于50 MB),但现在我遇到了不同的情况,它们大约是1.78 GB流。所以他们相当庞大。

对于基础架构,这是在Azure云服务中,并且具有7GB内存,并且正在将其扩展。 (它作为x64进程运行)。而且我知道潜在的问题是我一次在内存中有太多reportStream的副本(超过5或6)。

例外:

System.OutOfMemoryException: Array dimensions exceeded supported range.
at System.Text.Encoding.GetChars(System.Byte[] bytes, System.Int32 index, System.Int32 count) at offset 9
at System.Text.Encoding.Convert(System.Text.Encoding srcEncoding, System.Text.Encoding dstEncoding, System.Byte[] bytes, System.Int32 index, System.Int32 count) at offset 61
at System.Text.Encoding.Convert(System.Text.Encoding srcEncoding, System.Text.Encoding dstEncoding, System.Byte[] bytes) at offset 21
at MyNamespace.MyClass.ChangeEncoding(System.IO.Stream reportStream) at offset 72 in ... 

代码:

    private static Stream ChangeEncoding(Stream reportStream)
    {
        var utf8 = Encoding.UTF8;

        // The reports aren't actually in ASCII encoding.
        // There in a superset of ASCII that's specific to Windows called "Windows-1252".
        // Windows-1252 contains some special characters, whereas ASCII doesn't have any special
        // characters at all.
        // https://en.wikipedia.org/wiki/Windows-1252
        var win = Encoding.GetEncoding("Windows-1252");

        var length = (int) reportStream.Length;
        var buffer = new byte[length];
        int count;
        var sum = 0;

        // Read until Read method returns 0 (end of the stream has been reached)    
        while ((count = reportStream.Read(buffer, sum, length - sum)) > 0)
        {
            sum += count;
        }

        var convertedBytes = Encoding.Convert(win, utf8, buffer);

        var outputStream = new MemoryStream();

        outputStream.Write(convertedBytes, 0, convertedBytes.Length);

        // Reset the position otherwise it won't zip and subsequently upload the report.
        outputStream.Position = 0;

        return outputStream;
    }

要解决这个问题,如何更改此设置以便立即转换块中的编码而不是整个流?

0 个答案:

没有答案