Question

如何在不增加内存的情况下串联庞大的列表？

请考虑以下代码段：

 Console.WriteLine($"Initial memory size: {Process.GetCurrentProcess().WorkingSet64 /1024 /1024} MB");
 int[] a = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray();
 int[] b = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray();
 Console.WriteLine($"Memory size after lists initialization: {Process.GetCurrentProcess().WorkingSet64 / 1024 / 1024} MB");
 List<int> concat = new List<int>();
 concat.AddRange(a.Skip(500 * 1024 * 1024 / 4));
 concat.AddRange(b.Skip(500 * 1024 * 1024 / 4));
 Console.WriteLine($"Memory size after lists concatenation: {Process.GetCurrentProcess().WorkingSet64 / 1024 / 1024} MB");

输出为：

Initial memory size: 12 MB
Memory size after lists initialization: 2014 MB
Memory size after lists concatenation: 4039 MB

我希望在连接后将内存使用量保持在2014 MB，而无需修改a和b。

Answer 1

如果您需要List<int>，则不能这样做。 List<int>始终直接包含其数据，因此，当您拥有两个包含（例如）100个元素的数组以及通过将这两个元素串联而创建的列表时，您已经有了400个独立元素。您无法更改。

您正在寻找的是一种 not 创建数据的独立副本的方法。如果您只是在搜索它（听起来像在注释中），则可以使用通过LINQ创建的IEnumerable<int>：

IEnumerable<int> concat = a.Concat(b);

如果您需要IReadOnlyList<T>甚至是IList<T>之类的东西，则可以自己实现这些接口以在多个阵列上创建适配器-但您可能需要自己编写。如果您可以坚持使用IEnumerable<T>，则使用LINQ会容易得多。

Answer 2

我建议您进行一些优化：

将a和b初始化为IEnumerable<int>，而无需调用ToArray（）方法

int size = 1000 * 1024 * 1024 / 4;
IEnumerable<int> a = Enumerable.Range(0, size);
IEnumerable<int> b = Enumerable.Range(0, size);

以已知的容量初始化concat

List<int> concat = new List<int>(size);

结果，我得到以下输出：

Initial memory size: 12 MB
Memory size after lists initialization: 13 MB
Memory size after lists concatenation: 1021 MB

如果您只想串联搜索，则无需额外分配即可执行以下操作：

IEnumerable<int> concat = a.Skip(500 * 1024 * 1024 / 4).Concat(b.Skip(500 * 1024 * 1024 / 4));
int search = concat.Count(i => i % 2 == 0);
Console.WriteLine($"Search result: {search}");

Answer 3

它们是持久的。我只需要串联它们，进行一些搜索，然后处置串联列表

如果您只需要进行一些搜索，为什么首先需要进行串联？分别搜索两个数组。

在某些情况下，您搜索的内容可能会桥接两个阵列。如果是这种情况，为了使事情变得简单而不付出内存的代价，只需实现一个模拟该操作但实际上并不执行该操作的包装器即可：

sealed class Concatenated<T>:
    IReadOnlyList<T>
{
    public static Concatenated<T> 
        Concatenate<T>(
            IReadOnlyList<T> first,
            IReadOnlyList<T> second)
        => new ConcatenatedArray<T>(first, second);

    private readonly IReadOnlyList<T>
       first, second;

    private Concatenated(
        IReadOnlyList<T> first,
        IReadOnlyList<T> second)
    {
        this.first = first;
        this.second = second;
    }

    public T this[int index] 
        => index < first.Length ? 
           first[index]: 
           second[index - first.Length];

    public int Count => first.Length + second.Length;

    public IEnumerator<T> GetEnumerator()
    {
        foreach (var f in first)
            yield return f;

        foreach (var s in second)
            yield return s;
    }

    IEnumerator IEnumerable.GetEnumerator()
        => GetEnumerator();
}

Answer 4

使用Enumerable.Concat()。在source中，您可以看到ConcatIterator首先从first产生所有项目，然后从second产生所有项目。它不会复制原始的IEnumerables（在这种情况下为数组），而是使用引用。
（注意：为了获得最高速度和许多小的IEnumerables，您不应该这样做，但是为了使内存消耗最小，而有一些较大的IEnumerables，这是可行的）

Answer 5

正如InBetween所说，您实际上不应该列出新清单。我想象他的解决方案是什么“最好”的解决方案。

在回答您的第一个问题时，由于Garbage Collection如何与.NET（https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals）一起使用，您将遇到一些问题。

为了解决这个问题，最好的方法是不使用任何内置容器，以使自己能够完全控制您的内存使用，而无需使用任何类并将所有内容分配给堆栈。

下面是一些处理分配的示例，以便通过作用域对内存进行更严格的控制：

    void MyFunc(IList<int> combinedList)
    {
      int[] a = new int[LARGE_COUNT]; // This will initialize to the default value of the type. (default)int == 0
      int[] b = new int[LARGE_COUNT];

      // Add whatever you want to combinedList. This will just add both.
      combinedList.AddRange(a);
      combinedList.AddRange(b);
    }

上面的小节将立即丢弃a和b，这是因为它们是堆栈分配而不使用任何类。这将在结构与类中正确利用垃圾回收差异。

还有另一种方法可以使它更加繁重。

    List<int> concat = new List<int>();
    using (int[] a = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray()){
        concat.AddRange(a.Skip(500 * 1024 * 1024 / 4));
    }
    using (int[] b = Enumerable.Range(0, 1000 * 1024 * 1024 / 4).ToArray()){
        concat.AddRange(b.Skip(500 * 1024 * 1024 / 4));
    }
    // Do a GC.Collect() if you really don't want to put this in it's own scope for some reason.

GC.Collect（）是一种非常积极的方法，可用来学习正确设置.NET的垃圾收集以使其正常工作的方式。

如何在不使用额外内存的情况下串联列表？

5 个答案: