如何为二维LINQ操作创建高效的数据结构?

时间:2013-06-03 07:28:09

标签: c# .net performance linq

问题:我有两种对象,我们称之为BuildingImprovement。大约有30个Improvement个实例,而可能有1-1000个Building个。对于BuildingImprovement的每个组合,我必须执行一些繁重的计算,并将结果存储在Result对象中。

BuildingImprovement都可以用整数ID表示。

然后我需要能够:

  • 有效地访问给定ResultBuilding的{​​{1}} (编辑:请参阅下面的评论)
  • 针对给定Improvement的所有ResultImprovement执行汇总,例如.Sum()和.Average()
  • 针对给定Building的所有ResultBuilding执行相同的汇总

这将在网络服务器后端发生,因此内存可能是一个问题,但速度是最重要的。

到目前为止的想法:

  1. 使用Improvement作为关键字Dictionary<Tuple<int, int>, Result>。这应该给我快速插入和单个查找,但我担心<BuildingID, ImprovementID>.Where()性能。
  2. 使用二维数组,其中一个维度为.Sum()个,一个维度为BuildingID个,ImprovementID为值。此外,构建两个Result,将Dictionary<int, int>BuildingID映射到各自的数组行/列索引。这可能意味着最多1000 + ImprovementID s,这会是一个问题吗?
  3. 使用Dictionary。我认为这可能效率最低,有O(n)插入,但我可能错了。
  4. 我在这里错过了一个明显更好的选择吗?

    编辑:结果只是我感兴趣的聚合值(每List<Tuple<int, int, Result>>和每Building);看到我的回答。

3 个答案:

答案 0 :(得分:3)

通常,词典是查找效率最高的。当通过密钥访问时,查找效率和操作效率都是常数O(1)。这将有助于访问,第一点。

在第二个和第三个中你需要遍历所有项目O(n),所以没有办法加速它,除非你想按照指定的顺序O(n * n)走过它们 - 那么你可以使用SortedDictionray O(n),但是你会损害查找和操作效率(O(log n))。

所以我会选择你发布的第一个解决方案。

答案 1 :(得分:2)

您可以使用“词典词典”来保存结果数据,例如:

//             Building ID ↓               ↓ Improvement ID
var data = new Dictionary<int, Dictionary<int, Result>>();

这可以让您快速找到特定建筑的改进。

然而,找到包含特定改进的建筑物需要迭代所有建筑物。这是一些示例代码:

using System;
using System.Linq;
using System.Collections.Generic;

namespace Demo
{
    sealed class Result
    {
        public double Data;
    }

    sealed class Building
    {
        public int Id;
        public int Value;
    }

    sealed class Improvement
    {
        public int Id;
        public int Value;
    }

    class Program
    {
        void run()
        {
            //             Building ID ↓               ↓ Improvement ID
            var data = new Dictionary<int, Dictionary<int, Result>>();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new Dictionary<int, Result>();

                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                    improvements.Add(improvementKey, new Result{ Data = buildingKey + improvementKey/1000.0 });

                data.Add(buildingKey, improvements);
            }

            // Aggregate data for all improvements for building with ID == 1500:

            int buildingId = 1500;
            var sum = data[buildingId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);

            // Aggregate data for all buildings with a given improvement.

            int improvementId = 5010;

            sum = data.Sum(improvements =>
            {
                Result result;
                return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
            });

            Console.WriteLine(sum);
        }

        static void Main()
        {
            new Program().run();
        }
    }
}

为了加快第二次聚合(对于使用给定ID的所有改进的数据求和),我们可以使用第二个字典:

//                      Improvment ID ↓               ↓ Building ID
var byImprovementId = new Dictionary<int, Dictionary<int, Result>>();

你需要一个额外的字典来维护,但它并不太复杂。像这样的几个嵌套字典可能会占用太多内存 - 但值得考虑。

如下面的评论中所述,最好定义ID的类型以及字典本身。把它放在一起给出了:

using System;
using System.Linq;
using System.Collections.Generic;

namespace Demo
{
    sealed class Result
    {
        public double Data;
    }

    sealed class BuildingId
    {
        public BuildingId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as BuildingId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class ImprovementId
    {
        public ImprovementId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as ImprovementId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class Building
    {
        public BuildingId Id;
        public int Value;
    }

    sealed class Improvement
    {
        public ImprovementId Id;
        public int Value;
    }

    sealed class BuildingResults : Dictionary<BuildingId, Result>{}

    sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}

    sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}

    sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}

    class Program
    {
        void run()
        {
            var byBuildingId    = CreateTestBuildingsById();            // Create some test data.
            var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.

            // Aggregate data for all improvements for building with ID == 1500:

            BuildingId buildingId = new BuildingId(1500);

            var sum = byBuildingId[buildingId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);

            // Aggregate data for all buildings with a given improvement.

            ImprovementId improvementId = new ImprovementId(5010);

            sum = byBuildingId.Sum(improvements =>
            {
                Result result;
                return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
            });

            Console.WriteLine(sum);

            // Aggregate data for all buildings with a given improvement using byImprovementId.
            // This will be much faster than the above Linq.

            sum = byImprovementId[improvementId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);
        }

        static BuildingsById CreateTestBuildingsById()
        {
            var byBuildingId = new BuildingsById();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new ImprovementResults();

                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                {
                    improvements.Add
                    (
                        new ImprovementId(improvementKey),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        }
                    );
                }

                byBuildingId.Add(new BuildingId(buildingKey), improvements);
            }

            return byBuildingId;
        }

        static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
        {
            var byImprovementId = new ImprovementsById();

            foreach (var improvements in byBuildingId)
            {
                foreach (var improvement in improvements.Value)
                {
                    if (!byImprovementId.ContainsKey(improvement.Key))
                        byImprovementId[improvement.Key] = new BuildingResults();

                    byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
                }
            }

            return byImprovementId;
        }

        static void Main()
        {
            new Program().run();
        }
    }
}

最后,这是一个修改版本,它确定为特定改进聚合建筑/改进组合的所有实例的数据所花费的时间,并将元组字典的结果与字典词典进行比较。

我的RELEASE构建结果在任何调试器之外运行:

Dictionary of dictionaries took 00:00:00.2967741
Dictionary of tuples took 00:00:07.8164672

使用字典词典要快得多,但如果您打算进行许多这些聚合,这只是非常重要。

using System;
using System.Diagnostics;
using System.Linq;
using System.Collections.Generic;

namespace Demo
{
    sealed class Result
    {
        public double Data;
    }

    sealed class BuildingId
    {
        public BuildingId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as BuildingId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class ImprovementId
    {
        public ImprovementId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as ImprovementId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class Building
    {
        public BuildingId Id;
        public int Value;
    }

    sealed class Improvement
    {
        public ImprovementId Id;
        public int Value;
    }

    sealed class BuildingResults : Dictionary<BuildingId, Result>{}

    sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}

    sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}

    sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}

    class Program
    {
        void run()
        {
            var byBuildingId    = CreateTestBuildingsById();            // Create some test data.
            var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
            var testTuples      = CreateTestTuples();

            ImprovementId improvementId = new ImprovementId(5010);

            int count = 10000;

            Stopwatch sw = Stopwatch.StartNew();

            for (int i = 0; i < count; ++i)
                byImprovementId[improvementId].Sum(result => result.Value.Data);

            Console.WriteLine("Dictionary of dictionaries took " + sw.Elapsed);

            sw.Restart();

            for (int i = 0; i < count; ++i)
                testTuples.Where(result => result.Key.Item2.Equals(improvementId)).Sum(item => item.Value.Data);

            Console.WriteLine("Dictionary of tuples took " + sw.Elapsed);
        }

        static Dictionary<Tuple<BuildingId, ImprovementId>, Result> CreateTestTuples()
        {
            var result = new Dictionary<Tuple<BuildingId, ImprovementId>, Result>();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                    result.Add(
                        new Tuple<BuildingId, ImprovementId>(new BuildingId(buildingKey), new ImprovementId(improvementKey)),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        });

            return result;
        }

        static BuildingsById CreateTestBuildingsById()
        {
            var byBuildingId = new BuildingsById();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new ImprovementResults();

                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                {
                    improvements.Add
                    (
                        new ImprovementId(improvementKey),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        }
                    );
                }

                byBuildingId.Add(new BuildingId(buildingKey), improvements);
            }

            return byBuildingId;
        }

        static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
        {
            var byImprovementId = new ImprovementsById();

            foreach (var improvements in byBuildingId)
            {
                foreach (var improvement in improvements.Value)
                {
                    if (!byImprovementId.ContainsKey(improvement.Key))
                        byImprovementId[improvement.Key] = new BuildingResults();

                    byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
                }
            }

            return byImprovementId;
        }

        static void Main()
        {
            new Program().run();
        }
    }
}

答案 2 :(得分:0)

感谢您的回答,测试代码确实非常有用:)

我的解决方案原来是放弃LINQ,并在繁重的计算后直接手动执行聚合,因为我不得不迭代构建和改进的每个组合。

另外,我必须使用对象本身作为键,以便在将对象持久化到Entity Framework之前执行计算(即它们的ID都是0)。

代码:

public class Building {
    public int ID { get; set; }
    ...
}

public class Improvement {
    public int ID { get; set; }
    ...
}

public class Result {
    public decimal Foo { get; set; }
    public long Bar { get; set; }
    ...

    public void Add(Result result) {
        Foo += result.Foo;
        Bar += result.Bar;
        ...
    }   
}

public class Calculator {
    public Dictionary<Building, Result> ResultsByBuilding;
    public Dictionary<Improvement, Result> ResultsByImprovement;

    public void CalculateAndAggregate(IEnumerable<Building> buildings, IEnumerable<Improvement> improvements) {
        ResultsByBuilding = new Dictionary<Building, Result>();
        ResultsByImprovement = new Dictionary<Improvement, Result>();
        for (building in buildings) {
            for (improvement in improvements) {
                Result result = DoHeavyCalculation(building, improvement);

                if (ResultsByBuilding.ContainsKey(building)) {
                    ResultsByBuilding[building].Add(result);
                } else {
                    ResultsByBuilding[building] = result;
                }

                if (ResultsByImprovement.ContainsKey(improvement)) {
                    ResultsByImprovement[improvement].Add(result);
                } else {
                    ResultsByImprovement[improvement] = result;
                }
            }
        }
    }
}

public static void Main() {
    var calculator = new Calculator();
    IList<Building> buildings = GetBuildingsFromRepository();
    IList<Improvement> improvements = GetImprovementsFromRepository();
    calculator.CalculateAndAggregate(buildings, improvements);
    DoStuffWithResults(calculator);
}

我是这样做的,因为我确切地知道我想要的聚合;如果我需要一种更有活力的方法,我可能会选择@ MatthewWatson的词典词典。