IEnumerable <t>上的扩展方法:性能如何?</t>

时间:2011-05-11 04:47:32

标签: .net linq performance

来自我的导师:首选本地方法(直接在集合上实现)而不是IEnumerable的扩展方法,因为:

  

LINQ-to-Objects扩展方法   在IEnumerable上实现,   意味着在最坏的情况下   场景(当您搜索的项目时   在集合中不存在)你   将不得不列举所有   元素。如果你有一个包含或   存在直接实现的方法   集合,它可以利用   内部知识,也许只是做一个   哈希表查找或其他一些快速   操作

我非常困惑,因为我认为微软应该已经为IEnumerable Contains / Exists实现了哈希表。 List和IEnumerable的快速基准测试没有显示出差异:

static void Main(string[] args)
{
    Console.Write("input the number of elements: ");
    int count = Convert.ToInt32(Console.ReadLine());
    Console.Write("input the number of loops: ");
    int loop = Convert.ToInt32(Console.ReadLine());

    Random r = new Random();

    Stopwatch sw = new Stopwatch();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContains(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContainsEnumerable(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContains(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContainsEnumerable(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);


    Console.ReadKey();
}

static List<int> CreateListOfInt(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next());
    }
    return numbers;
}

static bool DoContains(List<int> list, int number)
{
    return list.Contains(number);
}

static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
    return list.Contains(number);
}


//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next(0,10000));
    }
    return numbers;
}

}

编辑:我尝试了HashSet实现,这大大提高了性能:

  sw.Reset();
            for (int i = 0; i < loop; i++)
            {
                var list = CreateListOfInt2(count);
                HashSet<int> hashtable = new HashSet<int>(list);
                sw.Start();
                for (int j = 0; j < count; j++)
                {
                    //make sure that the element is not in the list
                    hashtable.Contains(r.Next(20000, 50000));
                }
                sw.Stop();
            }
            Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

不过,你对我的导师说的意见是什么?

任何人都可以为我清理吗?我的导师是对的吗?如果他是对的,我的代码出了什么问题?

非常感谢

2 个答案:

答案 0 :(得分:4)

List<T> Contains次调用只是迭代列表,因此它们不会比扩展方法更快。如果您使用HashSet<T>并尝试一系列Contains()操作,您会发现明显的改进。

编辑:Microsoft没有为IEnumerable<T>扩展方法使用哈希的原因是他们不能保证实现类使用哈希或类似的东西。他们必须采用天真的方法,因为IEnumerable<T>接口只保证枚举实现类。

答案 1 :(得分:0)

如果LINQ版本在对象上具有更快的本机实现,则使用更快的实现。

例如,Count的实现方式如下:

if (source is Array)
    return source.Length;
if (source is ICollection)
    return source.Count;
// else iterate through all the items and count them.

Contains喜欢这样:

if (source is ICollection)
    return source.Contains(item);
// else iterate through the enumerable, and see if item exists

由于HashSet<T>实现了ICollection<T>,因此使用了原生包含。

因此,LINQ已针对标准接口进行了优化。但是,如果您的自定义类型具有不属于默认接口的本机调用,则LINQ调用可能会更慢。