什么是最有效,最复杂的加入/来自或内联选择?

时间:2014-12-04 13:28:10

标签: c# linq-to-objects

如果你看看这两个例子,哪一个是“合适的”?假设两者都有效并且两者都返回相同的结果

有第三种方法吗?

如果第二个是最好的方式,是否可以将3个内联选择包含在连接/来自?

var test1 = (from m in MasterList
            join m1 in Master1 on m.id equals m1.id
            select new 
            {
                Prop1 = m1,
                Prop2 = Master2.Where(x => x.id2 == m1.id2).SingleOrDefault(),
                Prop3 = Master3.Where(x => x.id2 == m1.id2).SingleOrDefault(),
                Prop4 = Master4.Where(x => x.id2 == m1.id2).ToList(),
                Prop5 = Master5.Where(x => x.id2 == m1.id2).ToList(),
                Prop6 = Master6.Where(x => x.id == m.id).Select(x => x.id2).SingleOrDefault(),
                Prop7 = m.id3,
                Prop8 = m.id4,
                Prop9 = m.id5,

            }).ToList();

var test2 = (from m in MasterList
            join m1 in Master1 on m.id equals m1.id
            join m2 in Master2 on m1.id2 equals m2.id2 into m2left
            from m2l in m2left.DefaultIfEmpty()
            join m3 in Master3 on m1.id2 equals m3.id2 into m3left
            from m3l in m3left.DefaultIfEmpty()
            select new
            {
                Prop1 = m1,
                Prop2 = m2l,
                Prop3 = m3l,
                Prop4 = Master4.Where(x => x.id2 == m1.id2).ToList(),
                Prop5 = Master5.Where(x => x.id2 == m1.id2).ToList(),
                Prop6 = Master6.Where(x => x.id == m.id).Select(x => x.id2).SingleOrDefault(),
                Prop7 = m.id3,
                Prop8 = m.id4,
                Prop9 = m.id5,

            }).ToList();

1 个答案:

答案 0 :(得分:1)

如果您的内部列表可能具有任何大小,则第二个通常更快且大大更具可扩展性。在sql世界中讨论了子选择与连接的性能。

基本上,至少对于linq到对象,没有什么可以优化掉你的子选择,因此你将一遍又一遍地枚举内部集合。如果它们具有任何尺寸,则可能非常昂贵。

查看这个简化示例的基准测试,代码如下(这些是32位和发布模式)。

Test for SUBSELECT master=100000;a=10000;b=10 took 12ms and returned 10000
Test for SUBSELECT master=100000;a=10000;b=100 took 7ms and returned 10000
Test for SUBSELECT master=100000;a=10000;b=1000 took 41ms and returned 10000
Test for SUBSELECT master=100000;a=10000;b=10000 took 387ms and returned 10000
Test for SUBSELECT master=100000;a=10000;b=100000 took 3803ms and returned 10000
Test for SUBSELECT master=100000;a=10000;b=1000000 took 38172ms and returned 10000

Test for JOIN master=100000;a=10000;b=10 took 14ms and returned 10000
Test for JOIN master=100000;a=10000;b=100 took 4ms and returned 10000
Test for JOIN master=100000;a=10000;b=1000 took 4ms and returned 10000
Test for JOIN master=100000;a=10000;b=10000 took 7ms and returned 10000
Test for JOIN master=100000;a=10000;b=100000 took 13ms and returned 10000
Test for JOIN master=100000;a=10000;b=1000000 took 297ms and returned 10000

您可以看到,对于子选择,在其他所有内容都已修复的情况下,时间会随着内部集的大小线性增加。加入版本如果相当平坦,直到b> master和a,从而提高了查询时间。

这是因为,基本上,join关键字可以启动散列连接并且只枚举每一方(内存允许),因此基本上只是按最大设置大小进行扩展,同样有很多讨论这是一个DB上下文在网上。

我很快就会回答你的清单问题。

示例代码:

class Program
{
    static void Main(string[] args)
    {
        //Just looking at how the size of the INNER set affects time
        TimedTest(100000, 10000, 10);
        TimedTest(100000, 10000, 100);
        TimedTest(100000, 10000, 1000);
        TimedTest(100000, 10000, 10000);
        TimedTest(100000, 10000, 100000);
        TimedTest(100000, 10000, 1000000);
        Console.ReadLine();
    }
    static void TimedTest(int masterSize, int aSize, int bSize)
    {
        var masterList = Enumerable.Range(1, masterSize).ToArray();
        var aList = Enumerable.Range(1, aSize).ToArray();
        var bList = Enumerable.Range(1, bSize).ToArray();
        var w = new Stopwatch();
        //Subselect
        w.Restart();
        var x = (from m in masterList
                 join a in aList on m equals a
                 select new { B = bList.Where(b => b == m).SingleOrDefault() }).ToList();
        w.Stop();
        Console.WriteLine("Test for SUBSELECT master={0};a={1};b={2} took {3}ms and returned {4}", masterSize, aSize, bSize, w.ElapsedMilliseconds, x.Count);
        w.Restart();
        var y = (from m in masterList
                 join a in aList on m equals a
                 join b in bList on a equals b into bLeft
                 from bl in bLeft.DefaultIfEmpty()
                 select new { B = bl }).ToList();
        w.Stop();
        Debug.Assert(x.SequenceEqual(y));
        Console.WriteLine("Test for JOIN master={0};a={1};b={2} took {3}ms and returned {4}", masterSize, aSize, bSize, w.ElapsedMilliseconds, y.Count);

        //Join

    }

对于'列表'项目。您可以在加入之前预先对它们进行分组,因此您可以:

static void Example()
{
    var masterSize = 10000;
    var aSize = 1000;
    var bSize = 100000;
    var masterList = Enumerable.Range(1, masterSize).ToArray();
    var aList = Enumerable.Range(1, aSize).ToArray();
    var bList = Enumerable.Range(1, bSize).Concat(Enumerable.Range(1, bSize)).ToArray();
    var w = new Stopwatch();
    //Subselect
    w.Restart();
    var x = (from m in masterList
             join a in aList on m equals a
             select new { A=a, M=m, B = bList.Where(b => b == m).ToList() }).ToList();
    w.Stop();
    Console.WriteLine("Test for SUBSELECT master={0};a={1};b={2} took {3}ms and returned {4}", masterSize, aSize, bSize, w.ElapsedMilliseconds, x.Count);
    //Join
    w.Restart();
    var y = (from m in masterList
             join a in aList on m equals a
             join b in (from b in bList group b by b) on m equals b.Key into bLeft
             from bl in bLeft.DefaultIfEmpty()
             select new { A = a, M = m, B = bl.ToList() }).ToList();
    w.Stop();
    Debug.Assert(x.Select(i => new { A = i.A, BC = i.B.Sum() }).SequenceEqual(y.Select(i => new { A = i.A, BC = i.B.Sum() })));
    Console.WriteLine("Test for JOIN master={0};a={1};b={2} took {3}ms and returned {4}", masterSize, aSize, bSize, w.ElapsedMilliseconds, y.Count);      

}

即你问的问题:

....
            join m4 in (from m4 in Master4 group m4 by m4.id) on m1.id2 equals m4.Key into m4left
            from m4l in m4left.DefaultIfEmpty()
....
            Prop4 = m4l.ToList()