Question

我们的软件通过GZipStream解压缩某些字节数据，MemoryStream从MemoryStream读取数据。这些数据以4KB的块解压缩并写入另一个MemoryStream.set_Capacity(Int32 value)。

我们已经意识到进程分配的内存远远高于实际的解压缩数据。

实施例：具有2,425,536字节的压缩字节数组被解压缩为23,050,718字节。我们使用的内存分析器显示方法MemoryStream.set_Capacity分配了67,104,936个字节。这是保留和实际写入内存之间的2.9倍。

注意：MemoryStream.EnsureCapacity是从MemoryStream.Write调用的，MemoryStream在我们的函数中从private byte[] Decompress(byte[] data) { using (MemoryStream compressedStream = new MemoryStream(data)) using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress)) using (MemoryStream resultStream = new MemoryStream()) { byte[] buffer = new byte[4096]; int iCount = 0; while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0) { resultStream.Write(buffer, 0, iCount); } return resultStream.ToArray(); } }调用。

为什么{{1}}保留了这么大的容量，即使它只附加了4KB的块？

以下是解压缩数据的代码段：

{{1}}

注意：如果相关，这是系统配置：

Windows XP 32位，
.NET 3.5
使用Visual Studio 2008编译

Answer 1

因为this is the algorithm扩展了它的容量。

public override void Write(byte[] buffer, int offset, int count) {

    //... Removed Error checking for example

    int i = _position + count;
    // Check for overflow
    if (i < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));

    if (i > _length) {
        bool mustZero = _position > _length;
        if (i > _capacity) {
            bool allocatedNewArray = EnsureCapacity(i);
            if (allocatedNewArray)
                mustZero = false;
        }
        if (mustZero)
            Array.Clear(_buffer, _length, i - _length);
        _length = i;
    }

    //... 
}

private bool EnsureCapacity(int value) {
    // Check for overflow
    if (value < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));
    if (value > _capacity) {
        int newCapacity = value;
        if (newCapacity < 256)
            newCapacity = 256;
        if (newCapacity < _capacity * 2)
            newCapacity = _capacity * 2;
        Capacity = newCapacity;
        return true;
    }
    return false;
}

public virtual int Capacity 
{
    //...

    set {
         //...

        // MemoryStream has this invariant: _origin > 0 => !expandable (see ctors)
        if (_expandable && value != _capacity) {
            if (value > 0) {
                byte[] newBuffer = new byte[value];
                if (_length > 0) Buffer.InternalBlockCopy(_buffer, 0, newBuffer, 0, _length);
                _buffer = newBuffer;
            }
            else {
                _buffer = null;
            }
            _capacity = value;
        }
    }
}

因此，每次达到容量限制时，它的容量都会增加一倍。这样做的原因是Buffer.InternalBlockCopy操作对于大型数组来说速度很慢，因此如果必须经常调整每个Write调用的大小，性能会显着下降。

您可以采取一些措施来提高性能，您可以将初始容量设置为至少压缩阵列的大小，然后可以将大小增加一个小于2.0的因子来减少你正在使用的内存量。

const double ResizeFactor = 1.25;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * ResizeFactor)) //Set the initial size to be the same as the compressed size + 25%.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
               resultStream.Capacity = resultStream.Capacity * ResizeFactor; //Resize to 125% instead of 200%

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

如果你想，你可以做更多花哨的算法，比如根据当前的压缩比调整大小

const double MinResizeFactor = 1.05;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * MinResizeFactor)) //Set the initial size to be the same as the compressed size + the minimum resize factor.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
            {
               double sizeRatio = ((double)resultStream.Position + iCount) / (compressedStream.Position + 1); //The +1 is to prevent divide by 0 errors, it may not be necessary in practice.

               //Resize to minimum resize factor of the current capacity or the 
               // compressed stream length times the compression ratio + min resize 
               // factor, whichever is larger.
               resultStream.Capacity =  Math.Max(resultStream.Capacity * MinResizeFactor, 
                                                 (sizeRatio + (MinResizeFactor - 1)) * compressedStream.Length);
             }

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

Answer 2

当空间不足时，

MemoryStream将其内部缓冲区加倍。这可能导致2倍的浪费。我不知道为什么你会看到更多。但这种基本行为是可以预期的。

如果您不喜欢此行为，请编写自己的流，将其数据存储在较小的块中（例如List<byte[1024 * 64]>）。这样的算法会将其浪费量限制在64KB。

Answer 3

看起来你正在查看已分配内存的总量，而不是最后一次调用。由于内存流在重新分配时会增加一倍，因此它每次增长大约两次 - 因此总分配的内存大约是2的幂的总和，如：

Sum _{i = 1} ^k（2 ⁱ）= 2 ^{k + 1} -1。

（其中k是重新分配的次数，例如k = 1 + log ₂ StreamSize

关于你所看到的内容。

Answer 4

好吧，增加流的容量意味着用新容量创建一个全新的数组，并复制旧的数组。这是非常昂贵的，如果你为每个Write做了，你的表现会受到很大影响。相反，MemoryStream扩展超过必要的范围。如果您想改善这种行为并且知道所需的总容量，只需使用带有MemoryStream参数的capacity构造函数:)然后您可以使用MemoryStream.GetBuffer代替ToArray太

您还会在内存分析器中看到丢弃的旧缓冲区（例如，从8 MiB到16 MiB等）。

当然，你不关心单个连续数组，所以你可能更好的想法就是拥有一个你自己的内存流，它使用根据需要创建的多个数组，必要时只需要很大的数据块。，然后只需将其全部复制到输出byte[]（如果您甚至根本不需要byte[] - 很可能，这是一个设计问题。）

为什么C＃内存流保留了这么多内存？

4 个答案: