Question

我尝试使用Linq和非Linq方法做同样的事情，发现Linq明显变慢（~3000x）。

为什么？

Linq方式：

for (int i = 0; i < totalElements; i += stepSize)
{
    var currentBlock = testList
        .Skip(i)
        .Take(stepSize);

    result.Add(currentBlock.Sum());
}

result.ToList();

非Linq方式：

for (int i = 0; i < totalElements; i += stepSize)
{
    var currentBlock = testList.GetRange(i, stepSize);

    result2.Add(currentBlock.Sum());
}

result2.ToList();

结果：

方法：Linq，拍摄时间：26667毫秒，元素：1000000，步长：100
方法：GetRange，拍摄时间：9毫秒，元素：1000000，步长：100

所要求的完整源代码：

static void Main(string[] args)
{
    var totalElements = 1000000;
    var testList = new List<int>(totalElements);
    var rand = new Random();

    // Initialize the list to random integers between 1 and 1000
    for (int i = 0; i < totalElements; i++)
    {
        testList.Add(rand.Next(1, 1000));
    }

    var result = new List<int>();
    var stepSize = 100;
    var stp = new Stopwatch();

    stp.Start();
    for (int i = 0; i < totalElements; i += stepSize)
    {
        var currentBlock = testList
            .Skip(i)
            .Take(stepSize);

        result.Add(currentBlock.Sum());
    }

    result.ToList();
    stp.Stop();

    Console.WriteLine($"Method: Linq, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");

    stp.Reset();

    var result2 = new List<int>();
    stp.Start();

    for (int i = 0; i < totalElements; i += stepSize)
    {
        var currentBlock = testList.GetRange(i, stepSize);

        result2.Add(currentBlock.Sum());
    }

    result2.ToList();
    stp.Stop();

    Console.WriteLine($"Method: GetRange, Time taken: {stp.ElapsedMilliseconds} ms, Elements: {totalElements}, Step Size: {stepSize}");
}

Answer 1

问题是Skip的工作原理，与GetRange完全不同。 Skip始终在枚举开头开始，这意味着您正在执行以下操作：

Iteration #1: Skip 0 Iteration #2: Skip 1 * step Iteration #3: Skip 2 * step Iteration #4: Skip 3 * step Iteration #5: Skip 4 * step .... Iteration #1.000: Skip 9.999 * step

如果您对1.000.000元素和step 100进行数学计算，则得到：

sum = 1 + 2 + 3 + .... + 9.999 = 9.999 * (9.999 + 1) / 2 = 49.995.000 total elements skipped: 49.995.000 * 100 = 4.999.500.000

所以，你的 Linq 版本有一个非常4.999.500.000个不必要的迭代。

这里的一个很好的问题是：为什么Skip已经针对source实施IList<T>的情况进行了优化，因为很明显，这是可能的。

Answer 2

GetRange使用Skip（）。它总是在开始时枚举。你想拥有的是一个函数，它将你的序列分成块而不需要迭代序列而不是真正需要的。

这意味着如果你只想要第一个Chunk，那么函数不应该迭代超过这个Chunk，如果我想要第9个Chunk之后的第10个chunk，它不应该在开始时迭代。

这个扩展功能怎么样？

public static IEnumerable<IEnumerable<Tsource>> ToChuncks<TSource>(
    this IEnumerable<TSource> source, int chunkSize)
{
    while (source.Any())                 // while there are elements left
    {   // still something to chunk
        // yield return a chunk
        yield return source.Take(chunkSize); // return a chunk of chunkSize

        // remove the chunk from the source
        source = source.Skip(chunkSize);     // skip the returned chunk
    }
}

此函数重复检查源序列中是否还有剩余内容。如果是这样，它将返回一个数据块并从源中删除该块。

这样，您的完整源代码将最多迭代两次：如果迭代Chunk中的元素，则迭代一次;如果遍历Chunks，则迭代一次。

为什么Linq明显变慢？

2 个答案: