是否有一个LINQ替代运算符到Where()运算符里面有List.Contains()方法

时间:2013-09-18 14:35:03

标签: c# linq optimization

使用LINQ,在谓词中使用Where()的{​​{1}}方法是否有更快的替代方法,可以完全相同的结果?

以下是一个例子:

List<T>.Contains()

我找到的另一种方法是使用List<int> a = ... List<int> b = ... var result = a.Where(x => b.Contains(x)); //very slow 方法:

Intersect()

var result = a.Intersect(b); 变量中,保留result个值顺序。 但是,如果a中的值包含重复项,则不会提供完全相同的结果,因为a运算符仅返回不同的值。

另一种方式:

Intersect()

如果var result = a.Join(b, x => x, y => y, (x, y) => x); 包含重复项,结果又不一样。

还有其他可能吗?

我想避免的事情:

  • 创建我自己的LINQ扩展方法
  • 在第一个列表中创建单独的b,并在HashSet内使用Contains()

2 个答案:

答案 0 :(得分:3)

从语义上讲,你想要的是一个左内连接。 LINQ Join运算符执行内连接,它很接近但不完全相同。幸运的是,您可以使用GroupJoin执行左连接

var query = from n in a
            join k in b
            on n equals k into matches
            where matches.Any()
            select n;

另一种选择是将第二个序列中的项目放入HashSet,这可以比List更有效地搜索。 (这类似于Join / GroupJoin将在内部执行的操作。)

var set = new HashSet<int>(b);
var query = a.Where(n => set.Contains(n));

另一种选择是像你一样使用Join,但只是首先从b删除所有重复项,因为如果没有重复项,那么它就会按照你想要的方式执行:

var result = a.Join(b.Distinct(), x => x, y => y, (x, y) => x);

答案 1 :(得分:0)

对于更快和重复,我会使用传统的“for”。

<强>被修改
我写了一个测试代码,考虑到:

  • 列出1000个随机整数。
  • 每种方法200次测试。
  • 1,2,4和8使用结果显示如果多次使用结果,则需要将IEnumerable<int> LINQ的结果转换为更好的数据结构,如List<int>

结果如下:

1 uses per result
Tigrou-Where        : count=  93,  3.167,0ms
Tigrou-Intersect    : count=  89,    116,0ms
Tigrou-Join         : count=  96,    179,0ms
Servy-GroupJoin     : count=  93,    262,0ms
Servy-HashSet       : count=  93,     71,0ms
Servy-JoinDisctinct : count=  93,    212,0ms
JoseH-TheOldFor     : count=  93,     72,0ms

2 uses per result
Tigrou-Where        : count=  93,  6.007,0ms
Tigrou-Intersect    : count=  89,    182,0ms
Tigrou-Join         : count=  96,    293,0ms
Servy-GroupJoin     : count=  93,    455,0ms
Servy-HashSet       : count=  93,     99,0ms
Servy-JoinDisctinct : count=  93,    407,0ms
JoseH-TheOldFor     : count=  93,     73,0ms

4 uses per result
Tigrou-Where        : count=  93, 11.866,0ms
Tigrou-Intersect    : count=  89,    353,0ms
Tigrou-Join         : count=  96,    565,0ms
Servy-GroupJoin     : count=  93,    899,0ms
Servy-HashSet       : count=  93,    165,0ms
Servy-JoinDisctinct : count=  93,    786,0ms
JoseH-TheOldFor     : count=  93,     73,0ms

8 uses per result
Tigrou-Where        : count=  93, 23.831,0ms
Tigrou-Intersect    : count=  89,    724,0ms
Tigrou-Join         : count=  96,  1.151,0ms
Servy-GroupJoin     : count=  93,  1.807,0ms
Servy-HashSet       : count=  93,    299,0ms
Servy-JoinDisctinct : count=  93,  1.570,0ms
JoseH-TheOldFor     : count=  93,     81,0ms

代码是:

class Program
{
    static void Main(string[] args)
    {
        Random random = new Random(Environment.TickCount);
        var cases = 1000;
        List<int> a = new List<int>(cases);
        List<int> b = new List<int>(cases);
        for (int c = 0; c < cases; c++)
        {
            a.Add(random.Next(9999));
            b.Add(random.Next(9999));
        }

        var times = 100;
        var usesCount = 1;

        Console.WriteLine("{0} times", times);
        for (int u = 0; u < 4; u++)
        {
            Console.WriteLine();
            Console.WriteLine("{0} uses per result", usesCount);
            TestMethod(a, b, "Tigrou-Where", Where, times, usesCount);
            TestMethod(a, b, "Tigrou-Intersect", Intersect, times, usesCount);
            TestMethod(a, b, "Tigrou-Join", Join, times, usesCount);
            TestMethod(a, b, "Servy-GroupJoin", GroupJoin, times, usesCount);
            TestMethod(a, b, "Servy-HashSet", HashSet, times, usesCount);
            TestMethod(a, b, "Servy-JoinDisctinct", JoinDistinct, times, usesCount);
            TestMethod(a, b, "JoseH-TheOldFor", TheOldFor, times, usesCount);
            usesCount *= 2;
        }

        Console.ReadLine();
    }

    private static void TestMethod(List<int> a, List<int> b, string name, Func<List<int>, List<int>, IEnumerable<int>> method, int times, int usesCount)
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        int count = 0;
        for (int t = 0; t < times; t++)
        {
            // Process
            var result = method(a, b);
            // Count
            for (int u = 0; u < usesCount; u++)
            {
                count = 0;
                foreach (var item in result)
                {
                    count++;
                }
            }
        }
        stopwatch.Stop();
        Console.WriteLine("{0,-20}: count={1,4}, {2,8:N1}ms", 
            name, count, stopwatch.ElapsedMilliseconds);
    }

    private static IEnumerable<int> Where(List<int> a, List<int> b)
    {
        return a.Where(x => b.Contains(x));
    }

    private static IEnumerable<int> Intersect(List<int> a, List<int> b)
    {
        return a.Intersect(b); 
    }

    private static IEnumerable<int> Join(List<int> a, List<int> b)
    {
        return a.Join(b, x => x, y => y, (x, y) => x);
    }

    private static IEnumerable<int> GroupJoin(List<int> a, List<int> b)
    {
        return from n in a
               join k in b
               on n equals k into matches
               where matches.Any()
               select n;
    }

    private static IEnumerable<int> HashSet(List<int> a, List<int> b)
    {
        var set = new HashSet<int>(b);
        return a.Where(n => set.Contains(n));
    }

    private static IEnumerable<int> JoinDistinct(List<int> a, List<int> b)
    {
        return a.Join(b.Distinct(), x => x, y => y, (x, y) => x);
    }

    private static IEnumerable<int> TheOldFor(List<int> a, List<int> b)
    {
        var result = new List<int>();
        int countA = a.Count;
        var setB = new HashSet<int>(b);
        for (int loopA = 0; loopA < countA; loopA++)
        {
            var itemA = a[loopA];
            if (setB.Contains(itemA))
                result.Add(itemA);
        }
        return result;
    }
}

更改代码中的一行,以便在使用它之前将结果转换为List<int>,并将其抛出8次:

8 uses per result
Tigrou-Where        : count=  97,  2.974,0ms
Tigrou-Intersect    : count=  91,     91,0ms
Tigrou-Join         : count= 105,    150,0ms
Servy-GroupJoin     : count=  97,    224,0ms
Servy-HashSet       : count=  97,     74,0ms
Servy-JoinDisctinct : count=  97,    223,0ms
JoseH-TheOldFor     : count=  97,     75,0ms

所以,我认为获胜者是:带有一点变体的Servy-HashSet方法:

var set = new HashSet<int>(b);
var result = a.Where(n => set.Contains(n)).ToList();