Question

有人能解释一下为什么泛型列表的Contains（）函数太慢了吗？我有一个包含大约一百万个数字的列表，以及不断检查这些数字中是否有特定数字的代码我尝试使用Dictionary和ContainsKey（）函数做同样的事情，它比List快10-20倍。
当然，我并不是真的想为此目的使用Dictionary，因为它并不意味着以这种方式使用所以，这里真正的问题是，List.Contains（）有什么替代方法，但不像Dictionary.ContainsKey（）那么糟糕吗？提前谢谢！

Answer 1

如果您只是检查是否存在，.NET 3.5中的HashSet<T>是您的最佳选择 - 字典式的性能，但没有键/值对 - 只是值：

    HashSet<int> data = new HashSet<int>();
    for (int i = 0; i < 1000000; i++)
    {
        data.Add(rand.Next(50000000));
    }
    bool contains = data.Contains(1234567); // etc

Answer 2

List.Contains是一个O（n）操作。

Dictionary.ContainsKey是一个O（1）操作，因为它使用对象的哈希码作为键，这使您的搜索能力更快。

我认为拥有一个包含一百万个条目的列表并不是一个好主意。我不认为List类是为此目的而设计的。：）

是不是可以将这些millon实体保存到RDBMS中，并对该数据库执行查询？

如果不可能，那么无论如何我都会使用词典。

Answer 3

我想我有答案！是的，列表（数组）中的Contains（）确实是O（n），但如果数组很短并且您使用的是值类型，那么它仍然应该非常快。但是使用CLR Profiler [从Microsoft免费下载]，我发现Contains（）是装箱值以便比较它们，这需要堆分配，这非常昂贵（慢）。 [注意：这是.Net 2.0;其他.Net版本未经测试。]

这是完整的故事和解决方案。我们有一个名为“VI”的枚举，并创建了一个名为“ValueIdList”的类，它是VI对象列表（数组）的抽象类型。最初的实现是在古老的.Net 1.1天，它使用了封装的ArrayList。我们最近在http://blogs.msdn.com/b/joshwil/archive/2004/04/13/112598.aspx中发现，泛型列表（List＆lt; VI＆gt;）在值类型（如我们的枚举VI）上比ArrayList表现得更好，因为值不必加框。这是真的，它几乎可以工作。

CLR Profiler发布了一个惊喜。以下是分配图的一部分：

ValueIdList ::包含bool（VI）5.5MB（34.81％）
Generic.List ::包含bool（＆lt; UNKNOWN＆gt;）5.5MB（34.81％）
Generic.ObjectEqualityComparer＆lt; T＆gt; :: Equals bool（＆lt; UNKNOWN＆gt;＆lt; UNKNOWN＆gt;）5.5MB（34.88％）
Values.VI 7.7MB（49.03％）

正如您所看到的，Contains（）令人惊讶地调用Generic.ObjectEqualityComparer.Equals（），这显然需要装入VI值，这需要昂贵的堆分配。奇怪的是，微软将消除列表中的拳击，只是为了简单的操作再次要求它。

我们的解决方案是重新编写Contains（）实现，在我们的例子中很容易做到，因为我们已经封装了通用列表对象（_items）。这是简单的代码：

public bool Contains(VI id) 
{
  return IndexOf(id) >= 0;
}

public int IndexOf(VI id) 
{ 
  int i, count;

  count = _items.Count;
  for (i = 0; i < count; i++)
    if (_items[i] == id)
      return i;
  return -1;
}

public bool Remove(VI id) 
{
  int i;

  i = IndexOf(id);
  if (i < 0)
    return false;
  _items.RemoveAt(i);

  return true;
}

VI值的比较现在正在我们自己的IndexOf（）版本中完成，它不需要装箱，而且速度非常快。在这个简单的重写后，我们的特定程序加速了20％。 O（n）......没问题！只是避免浪费内存使用！

Answer 4

字典并不坏，因为字典中的键设计得很快。要在列表中查找数字，需要遍历整个列表。

当然，只有当您的号码是唯一且没有订购时，该词典才有效。

我认为.NET 3.5中还有一个HashSet<T>类，它也只允许使用唯一元素。

Answer 5

SortedList搜索速度会更快（但插入项目的速度会慢一些）

Answer 6

这不是你问题的答案，但我有一个类可以提高集合中Contains（）的性能。我将一个Queue子类化，并添加了一个将哈希码映射到对象列表的词典。 Dictionary.Contains()函数为O（1），而List.Contains()，Queue.Contains()和Stack.Contains()为O（n）。

字典的值类型是一个包含具有相同哈希码的对象的队列。调用者可以提供实现IEqualityComparer的自定义类对象。您可以将此模式用于堆栈或列表。代码只需要进行一些更改。

/// <summary>
/// This is a class that mimics a queue, except the Contains() operation is O(1) rather     than O(n) thanks to an internal dictionary.
/// The dictionary remembers the hashcodes of the items that have been enqueued and dequeued.
/// Hashcode collisions are stored in a queue to maintain FIFO order.
/// </summary>
/// <typeparam name="T"></typeparam>
private class HashQueue<T> : Queue<T>
{
    private readonly IEqualityComparer<T> _comp;
    public readonly Dictionary<int, Queue<T>> _hashes; //_hashes.Count doesn't always equal base.Count (due to collisions)

    public HashQueue(IEqualityComparer<T> comp = null) : base()
    {
        this._comp = comp;
        this._hashes = new Dictionary<int, Queue<T>>();
    }

    public HashQueue(int capacity, IEqualityComparer<T> comp = null) : base(capacity)
    {
        this._comp = comp;
        this._hashes = new Dictionary<int, Queue<T>>(capacity);
    }

    public HashQueue(IEnumerable<T> collection, IEqualityComparer<T> comp = null) :     base(collection)
    {
        this._comp = comp;

        this._hashes = new Dictionary<int, Queue<T>>(base.Count);
        foreach (var item in collection)
        {
            this.EnqueueDictionary(item);
        }
    }

    public new void Enqueue(T item)
    {
        base.Enqueue(item); //add to queue
        this.EnqueueDictionary(item);
    }

    private void EnqueueDictionary(T item)
    {
        int hash = this._comp == null ? item.GetHashCode() :     this._comp.GetHashCode(item);
        Queue<T> temp;
        if (!this._hashes.TryGetValue(hash, out temp))
        {
            temp = new Queue<T>();
            this._hashes.Add(hash, temp);
        }
        temp.Enqueue(item);
    }

    public new T Dequeue()
    {
        T result = base.Dequeue(); //remove from queue

        int hash = this._comp == null ? result.GetHashCode() : this._comp.GetHashCode(result);
        Queue<T> temp;
        if (this._hashes.TryGetValue(hash, out temp))
        {
            temp.Dequeue();
            if (temp.Count == 0)
                this._hashes.Remove(hash);
        }

        return result;
    }

    public new bool Contains(T item)
    { //This is O(1), whereas Queue.Contains is (n)
        int hash = this._comp == null ? item.GetHashCode() : this._comp.GetHashCode(item);
        return this._hashes.ContainsKey(hash);
    }

    public new void Clear()
    {
        foreach (var item in this._hashes.Values)
            item.Clear(); //clear collision lists

        this._hashes.Clear(); //clear dictionary

        base.Clear(); //clear queue
    }
}

我的简单测试表明，HashQueue.Contains()的运行速度比Queue.Contains()快得多。运行计数设置为10,000的测试代码，HashQueue版本为0.00045秒，Queue版本为0.37秒。计数为100,000，HashQueue版本需要0.0031秒，而队列需要36.38秒！

这是我的测试代码：

static void Main(string[] args)
{
    int count = 10000;

    { //HashQueue
        var q = new HashQueue<int>(count);

        for (int i = 0; i < count; i++) //load queue (not timed)
            q.Enqueue(i);

        System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
        for (int i = 0; i < count; i++)
        {
            bool contains = q.Contains(i);
        }
        sw.Stop();
        Console.WriteLine(string.Format("HashQueue, {0}", sw.Elapsed));
    }

    { //Queue
        var q = new Queue<int>(count);

        for (int i = 0; i < count; i++) //load queue (not timed)
            q.Enqueue(i);

        System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
        for (int i = 0; i < count; i++)
        {
            bool contains = q.Contains(i);
        }
        sw.Stop();
        Console.WriteLine(string.Format("Queue,     {0}", sw.Elapsed));
    }

    Console.ReadLine();
}

Answer 7

为什么字典不合适？

要查看列表中是否有特定值，您需要遍历整个列表。使用字典（或其他基于散列的容器），缩短需要比较的对象数量要快得多。密钥（在您的情况下，数字）经过哈希处理，并为字典提供要比较的对象的小数子集。

Answer 8

我在Compact Framework中使用它，不支持HashSet，我选择了一个字典，其中两个字符串都是我正在寻找的值。

这意味着我得到了列表＆lt;＆gt;功能与字典性能。它有点hacky，但它确实有效。

C＃，List <t> .Contains（） - 太慢了？</t>

8 个答案: