Question

如果我有两个列表并且我想知道是否至少有一个共同元素，我有两个选项：

lst1.Intersect(lst2).Any();

Lst1.Any(x => lst2.Contains(x));

这两个选项给了我预期的结果，但是我不知道什么是最好的选择。哪个更有效率？为什么？

感谢。

编辑：当我创建这篇文章时，除了解决方案之外，我正在寻找原因。我知道我可以运行测试，但我不知道结果的原因。一个比另一个快？总是一种解决方案比另一种更好吗？

因此，我接受了马修的答案，不仅是为了测试代码，而且还解释了什么时候比其他人好，为什么。我非常感谢尼古拉斯和奥伦的贡献。

感谢。

Answer 1

Oren的回答在秒表的使用方式上有误。在测量Any()所花费的时间之后，它不会在循环结束时重置。

请注意，如果秒表永远不是Reset()，它会回到循环的开头，以便添加到intersect 的时间包括 { {1}}。

以下是更正后的版本。

在任何调试器外部运行的发布版本会在我的PC上显示此结果：

Any()

请注意我是如何为此测试制作两个不重叠的字符串列表。另请注意，这是最糟糕的测试。

如果有许多交叉点（或交叉点恰好发生在数据的开头附近）那么Oren就可以说Intersect: 1ms Any: 6743ms应该更快。

如果真实数据通常包含交叉点，那么使用Any()可能更好。否则，请使用Any()。它非常依赖数据。

Intersect()

出于比较目的，我编写了自己的测试，比较using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; namespace Demo { class Program { void run() { double intersect = 0; double any = 0; Stopwatch stopWatch = new Stopwatch(); List<string> L1 = Enumerable.Range(0, 10000).Select(x => x.ToString()).ToList(); List<string> L2 = Enumerable.Range(10000, 10000).Select(x => x.ToString()).ToList(); for (int i = 0; i < 10; i++) { stopWatch.Restart(); Intersect(L1, L2); stopWatch.Stop(); intersect += stopWatch.ElapsedMilliseconds; stopWatch.Restart(); Any(L1, L2); stopWatch.Stop(); any += stopWatch.ElapsedMilliseconds; } Console.WriteLine("Intersect: " + intersect + "ms"); Console.WriteLine("Any: " + any + "ms"); } private static bool Any(List<string> lst1, List<string> lst2) { return lst1.Any(lst2.Contains); } private static bool Intersect(List<string> lst1, List<string> lst2) { return lst1.Intersect(lst2).Any(); } static void Main() { new Program().run(); } } }序列：

int

代码：

intersect took 00:00:00.0065928
any took       00:00:08.6706195

如果我也通过将列表更改为此并将using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; namespace Demo { class Program { void run() { var lst1 = Enumerable.Range(0, 10000); var lst2 = Enumerable.Range(10000, 10000); int count = 10; DemoUtil.Time(() => lst1.Intersect(lst2).Any(), "intersect", count); DemoUtil.Time(() => lst1.Any(lst2.Contains), "any", count); } static void Main() { new Program().run(); } } static class DemoUtil { public static void Print(this object self) { Console.WriteLine(self); } public static void Print(this string self) { Console.WriteLine(self); } public static void Print<T>(this IEnumerable<T> self) { foreach (var item in self) Console.WriteLine(item); } public static void Time(Action action, string title, int count) { var sw = Stopwatch.StartNew(); for (int i = 0; i < count; ++i) action(); (title + " took " + sw.Elapsed).Print(); } } }增加到10000来为重叠范围计时

：

count

我得到了这些结果：

var lst1 = Enumerable.Range(10000, 10000);
var lst2 = Enumerable.Range(10000, 10000);

在这种情况下，intersect took 00:00:03.2607476 any took 00:00:00.0019170显然要快得多。

<强>结论

Any()的最坏情况表现非常糟糕，但Any()可以接受。最佳案例表现对Intersect()非常有利，而Any()则表现不佳。（Intersect()的最佳情况可能是Any()的最坏情况！）

Intersect()方法在最坏的情况下是O（N ^ 2），在最好的情况下是O（1）。 Any()方法总是O（N）（因为它使用散列，而不是排序，否则它将是O（N（Log（N）））。

您还必须考虑内存使用情况：Intersect()方法需要获取其中一个输入的副本，而Intersect()则不需要。

因此，为了做出最佳决策，您确实需要了解实际数据的特征，并实际执行测试。

如果你真的不希望Any()在最坏的情况下变成O（N ^ 2），那么你应该使用Any()。但是，您最好使用Intersect()。

当然，大部分时间都不重要！

除非您发现这部分代码成为瓶颈，否则这仅仅是学术兴趣。如果没有问题，你不应该浪费时间进行这种分析。：）

Answer 2

这取决于你的IEnumerables的实现。

您的第一次尝试（Intersect / Any），查找所有匹配项，然后确定该项是否为空。从文档中看，这看起来像是O（n）操作：

枚举此方法返回的对象时，Intersect首先枚举，收集该序列的所有不同元素。然后列举[] 第二，标记两个序列中出现的那些元素。最后，标记元素按照它们被收集的顺序产生。

您的第二次尝试（Any / Contains）枚举第一个集合，O（n）操作，并且对于第一个集合中的每个项目，枚举第二个集合，另一个O（n））操作，以查看是否找到匹配元素。这使它类似于O（n ²）操作，不是吗？您认为哪个更快？

但要考虑的一件事是，Contains()查找某些集合或集合类型（例如，字典，二叉树或允许二进制搜索或哈希表查找的有序集合）可能是一个廉价的操作，如果Contains()实现非常智能，可以利用它所运行的集合的语义。

但是您需要尝试使用集合类型来找出哪些更好。

Answer 3

请参阅Matthew的答案，了解完整而准确的故障。

相对容易模拟并尝试自己：

        bool found;

        double intersect = 0;
        double any = 0;

        for (int i = 0; i < 100; i++)
        {
            List<string> L1 = GenerateNumberStrings(200000);
            List<string> L2 = GenerateNumberStrings(60000);
            Stopwatch stopWatch = new Stopwatch();

            stopWatch.Start();
            found = Intersect(L1, L2);
            stopWatch.Stop();
            intersect += stopWatch.ElapsedMilliseconds;

            stopWatch.Reset();

            stopWatch.Start();
            found = Any(L1, L2);
            stopWatch.Stop();
            any += stopWatch.ElapsedMilliseconds;
        }

        Console.WriteLine("Intersect: " + intersect + "ms");
        Console.WriteLine("Any: " + any + "ms");
    }

    private static bool Any(List<string> lst1, List<string> lst2)
    {
        return lst1.Any(x => lst2.Contains(x));
    }

    private static bool Intersect(List<string> lst1, List<string> lst2)
    {
        return lst1.Intersect(lst2).Any();
    }

从长远来看，你会发现Any方法明显更快，可能是因为它不需要内存分配和设置相交需要（Any停止并返回{{1} }一旦找到匹配项，而true实际上需要将匹配项存储在新的Intersect中。）

相交和任何或包含和任何。找到至少一个共同元素更有效率？

3 个答案: