更新列表的方法

Question

我正在使用List＆lt;＆gt;集合，在2个嵌套循环中向集合中添加新对象。在完成循环执行后，集合中添加了大约500000个项目。

首先，添加操作运行良好，但很快就会注意到性能下降，对于最后几千个元素，延迟时间难以忍受。

我尝试了各种技巧（初始化具有特定大小的集合--500000），替换List＆lt;＆gt;使用LinkedList＆lt;＆gt;收集，但它没有太多帮助。

您能否向我推荐解决问题的方法？我很有兴趣用更优化的结构改变结构 - LinkedList＆lt;＆gt;例如，比List＆lt;＆gt;表现更好通过添加等操作。

更新列表的方法

   private void UpdateForecastList(ConcurrentDictionary<Int32, RegistroSalidaProductoPrevision> prediccion, bool soloMejoresMetodos = true)
   {
        foreach (KeyValuePair<int, RegistroSalidaProductoPrevision> kvp in prediccion)
        {
            KeyValuePair<int, RegistroSalidaProductoPrevision> localKvp = kvp;

            IList<Prediccion> pExistente = prediccionList.Where(p => p.Id == localKvp.Key).ToList();

            Articulo articulo = (articuloList.Where(a => a.Id == localKvp.Key)).First();

            if (pExistente.Count > 0)
            {
                foreach (var p in pExistente)
                {
                    prediccionList.Remove(p);
                }
            }

            if (kvp.Value.Previsiones.Count > 0)
            {
                var previsiones = kvp.Value.Previsiones.Where(prevision => prevision.Value.LPrevision[1] != null).ToList();
                int previsionesCount = previsiones.Count;

                for (int a = 0; a < previsionesCount; a++)
                {
                    var registros = previsiones[a].Value.LPrevision[1].Serie;
                    int c = registros.Count;

                    if (soloMejoresMetodos)
                    {
                        if (localKvp.Value.MejorMetodo != previsiones[a].Key) continue;
                        for (int i = 0; i < c; i++)
                        {
                            var p = new Prediccion()
                                        {
                                            Id = articulo.Id,
                                            Nombre = articulo.Codigo,
                                            Descripcion = articulo.Descripcion,
                                            NombreMetodo =
                                                Utils.SplitStringByCapitals(previsiones[a].Value.NombreMetodo),
                                            Fecha = registros[i].Fecha,
                                            PrediccionArticulo = Math.Round(registros[i].Cantidad, 2),
                                            EsMejorMetodo =
                                                (previsiones[a].Value.NombreMetodo == localKvp.Value.MejorMetodo)
                                                    ? true
                                                    : false
                                        };

                            // This line experiences performance loss
                            prediccionList.Add(p);
                        }
                    }
                    else
                    {
                        for (int i = 0; i < c; i++)
                        {
                            prediccionList.Add(new Prediccion()
                                                   {
                                                       Id = articulo.Id,
                                                       Nombre = articulo.Codigo,
                                                       Descripcion = articulo.Descripcion,
                                                       NombreMetodo = previsiones[a].Value.NombreMetodo,
                                                       Fecha = registros[i].Fecha,
                                                       PrediccionArticulo =
                                                           Math.Round(registros[i].Cantidad, 2),
                                                       EsMejorMetodo =
                                                           (previsiones[a].Value.NombreMetodo ==
                                                            localKvp.Value.MejorMetodo)
                                                               ? true
                                                               : false
                                                   });
                        }
                    }
                }
            }
            else
            {
                prediccionList.Add(new Prediccion()
                                       {
                                           Id = articulo.Id,
                                           Nombre = articulo.Codigo,
                                           Descripcion = articulo.Descripcion,
                                           NombreMetodo = kvp.Value.ErroresDatos[0].Texto,
                                       });
            }
        }
    }

方法的小描述： - 该方法读取对象（并发字典）并使用对应于特定文章的预测更新列表（在这种情况下为LinkedList）。

并发字典对象会从同时访问它的各种线程不断更新。

列表初始化为对应于所有文章的空预测;因此，例如，如果您有700篇文章，那么在开始时该列表将填充700个空白预测。

当并发字典由其中一个计算线程更新时，它会引发一个事件，该事件调用上面提到的方法，然后更新列表（prediccionList）。

在prediccionList中可以保存的最大记录数（在本例中）约为500000条记录，但在列表中添加了大约40000条记录后，可能会注意到性能损失。

代码可能看起来有点生疏，因为我尝试了各种优化技巧（用for替换foreach'，计算循环外的计数，用LinkedList＆lt;＆gt;等替换List＆lt;＆gt;对象）。最后我得出结论，减慢执行时间的部分是“prediccionList.Add（p）;”。

添加到列表中的对象是Prediccion类的实例;这个对象我认为不是很重，它只包含7个字段。

内存使用

我附加了内存分析的结果。使用的内存不超过256 MB，因此我不相信内存应该是一个问题。 enter image description here

Answer 1

根据我的经验，List<T>性能取决于内存。它始终遵循相同的模式，插入快速到达某一点，然后性能急剧下降。在我的机器上，当我达到1.2G内存标记时通常会发生这种情况。几乎我尝试过的所有收藏品都有同样的问题，所以我认为它更像是一个.net底层问题，而不是List<T>问题。

我建议尝试减少使用500.000的事物的对象大小（用long替换int s等等），然后尝试。
但要注意，即使你管理它在你的机器上快速工作，也可能超过了部署应用程序的机器的门槛。

Answer 2

该问题与List或任何其他.NET数据结构的性能无关。你的问题纯粹是算法。例如，您有以下代码片段：

    foreach (KeyValuePair<int, RegistroSalidaProductoPrevision> kvp in prediccion)
    {
        KeyValuePair<int, RegistroSalidaProductoPrevision> localKvp = kvp;

        IList<Prediccion> pExistente = prediccionList.Where(p => p.Id == localKvp.Key).ToList();

        Articulo articulo = (articuloList.Where(a => a.Id == localKvp.Key)).First();

因此，对于字典中的每个项目（prediccion），您都会遍历整个prediccionList。您已经实现了n ^ 2算法。执行该方法所需的时间与prediccion.Count * prediccionList.Count成比例。

你需要一个更好的算法;不是更快的收集数据结构。

Answer 3

如果您要添加到列表中的对象具有任何显着大小，则可能会遇到内存限制。

如果您的进程是32位，那么在耗尽地址空间之前，您将被限制为总共2GB，但如果它是64位，您可以轻松地超过计算机中的物理内存并开始分页磁盘。

你的物品有多大？

Answer 4

随着你的列表越来越大，每当它展开收集垃圾时，框架就会将其内容复制到新的列表位置，因为垃圾收集器的方式如何作品。这就是为什么随着它变大而变得越来越慢。（GC on MSDN）

可能的解决方案（我能想到）正在使用具有预定义大小的列表或数组，您确定它不会填满，或者如果这不是一个选项，那么使用System.Collections.Generic.LinkedList，但是您已经尝试过，您可能必须实现自定义列表，如果适用，则单链接（LinkedList是双链接的）。

为了增加获得良好答案的机会，你应该发布你收集的对象的代码，以及你添加项目的部分，这样我们就能更好地理解它的全部内容。

另外，请查看http://www.simple-talk.com/dotnet/performance/the-top-5-.net-memory-management-misconceptions/，我认为这会对您有所帮助。

更新：索引应该是廉价的操作，但是，你可以尝试在循环开始时将previsiones [a]（以及嵌套循环中的registros [i]）读入局部变量，你将保存几个索引（x 100000次迭代，可能会有所不同，如果clr没有优化这个？）。

Answer 5

使用Struct而不是Class可以显着提高您的性能。

您还可以通过从Prediccion类/结构中丢失字符串属性来获得性能。

我很长一段时间对实际影响感兴趣，所以这是我的基准：

我采用了不同的数据结构，并在其中放置了2000万个对象/结构。结果如下：

List:
Adding 20000000 TestClass to a List`1 took 3563,2068 ms
Accessing 20000000 TestClass from a List took 103,0203 ms
Adding 20000000 TestStruct to a List`1 took 2239,9639 ms
Accessing 20000000 TestStruct from a List took 254,3245 ms

Initialized List:
Adding 20000000 TestClass to a List`1 took 3774,772 ms
Accessing 20000000 TestClass from a List took 99,0548 ms
Adding 20000000 TestStruct to a List`1 took 1520,7765 ms
Accessing 20000000 TestStruct from a List took 257,5064 ms

LinkedList:
Adding 20000000 TestClass to a LinkedList`1 took 6085,6478 ms
Adding 20000000 TestStruct to a LinkedList`1 took 7771,2243 ms

HashSet:
Adding 20000000 TestClass to a HashSet`1 took 10816,8488 ms
Adding 20000000 TestStruct to a HashSet`1 took 3694,5187 ms

Now I added a string to the class/struct:
List:
Adding 20000000 TestClassWithString to a List`1 took 4925,1215 ms
Accessing 20000000 TestClassWithString from a List took 120,0348 ms
Adding 20000000 TestStructWithString to a List`1 took 3554,7463 ms
Accessing 20000000 TestStructWithString from a List took 456,3299 ms

这是我的测试程序：

    static void Main(string[] args)
    {
        const int noObjects = 20*1000*1000;

        Console.WriteLine("List:");
        RunTest(new List<TestClass>(), noObjects);
        RunTest(new List<TestStruct>(), noObjects);
        Console.WriteLine();

        Console.WriteLine("Initialized List:");
        RunTest(new List<TestClass>(noObjects), noObjects);
        RunTest(new List<TestStruct>(noObjects), noObjects);
        Console.WriteLine();

        Console.WriteLine("LinkedList:");
        RunTest(new LinkedList<TestClass>(), noObjects);
        RunTest(new LinkedList<TestStruct>(), noObjects);
        Console.WriteLine();

        Console.WriteLine("HashSet:");
        RunTest(new HashSet<TestClass>(), noObjects);
        RunTest(new HashSet<TestStruct>(), noObjects);
        Console.WriteLine();

        Console.WriteLine("Now I added a string to the class/struct:");
        Console.WriteLine("List:");
        RunTest(new List<TestClassWithString>(), noObjects);
        RunTest(new List<TestStructWithString>(), noObjects);
        Console.WriteLine();

        Console.ReadLine();
    }




    private static void RunTest<T>(ICollection<T> collection, int noObjects) where T : ITestThing
    {
        Stopwatch sw = new Stopwatch();
        sw.Restart();
        for (int i = 0; i < noObjects; i++)
        {
            var obj = Activator.CreateInstance<T>();
            obj.Initialize();
            collection.Add(obj);
        }
        sw.Stop();
        Console.WriteLine("Adding " + noObjects + " " + typeof(T).Name + " to a " + collection.GetType().Name + " took " + sw.Elapsed.TotalMilliseconds + " ms");

        if (collection is IList)
        {
            IList list = (IList) collection;
            // access all list objects
            sw.Restart();
            for (int i = 0; i < noObjects; i++)
            {
                var obj = list[i];
            }
            sw.Stop();
            Console.WriteLine("Accessing " + noObjects + " " + typeof (T).Name + " from a List took " + sw.Elapsed.TotalMilliseconds + " ms");
        }
    }

TestClass和TestStruct看起来都像这样（一个用＆＃39; class＆＃39;，一个带＆＃39; struct＆＃39;）：

public class TestClass : ITestThing
{
    public int I1;
    public int I2;
    public double D1;
    public double D2;
    public long L1;
    public long L2;

    public void Initialize()
    {
        D1 = 1;
        D2 = 2;
        I1 = 3;
        I2 = 4;
        L1 = 5;
        L2 = 6;
    }
}

只有TestStruct是public struct而不是public class和TestClassWithString和TestStructWithString public string S1，用＆＃34; abc＆＃34;初始化。

ITestThing就在那里因为结构体不能有一个构造函数，所以我需要一些方法以通用的方式调用Initialize（）方法，但事实证明，如果我调用Initialize它并没有太大的区别（）与否。

请注意，如果我为没有任何Interface或Activator.CreateInstance的每个测试用例编写了代码，那么持续时间的差异会更加极端，但是只要我添加第二个测试用例，代码就会变得太大...

<强>概要

通过使用具有初始大小的List并将Structs放入其中，而不是类实例（对象），可以极大地提高性能。还要尽量避免在Structs中使用字符串，因为每个String实例都是一个你试图通过使用Struct而不是Object来避免的对象。

Answer 6

如何使用数组而不是List？您可以将其初始化为初始大小（比方说500000个元素），如果这还不够，请使用Array.Resize添加另外100000个。您只需要跟踪元素的实际数量，{{1 } property只会给你元素的数量。

但请注意，Length调用也可能非常耗时，因为基本上会生成新大小的新数组，并且原始数组中的所有元素都将被复制到新数组中。你不应该经常这样说。

Answer 7

您是否尝试过初始化容量。因此，它不需要重新分配内存并将旧内容传送到新的内存空间。

List<long> thelist = new List<long>(500000);

Answer 8

您可以使用更快（但不可查询）的数组。我不知道您的代码的细节，但您可能想要折射和使用数据库。 500000项永远不会很快

C＃List＆lt;＆gt; Add（）方法性能

更新列表的方法

内存使用

8 个答案: