Question

我写了一个方法，使用df <- do.call("rbind", xpathApply(doc, "//game", function(m) { data.frame( game_id = xmlAttrs(m)["id"], t(xpathSApply(m, "group", function(g) { c( group_id = xmlAttrs(g)["id"], group = xmlValue(g[["group"]]) ) })), t(xpathSApply(m, "server",function(b){ sid <- xmlAttrs(b)[["sid"]] name <- xmlAttrs(b)[["name"]] xpathSApply(b, "offer",function(of){ c( sid = sid, name = name, id = xmlAttrs(of)[["id"]], do.call(cbind, xpathApply(of, "states",function(o){ c(s1 <- xmlValue(o[["s1"]]), s2 <- xmlValue(o[["s2"]]), s3 <- xmlValue(o[["s3"]]) ) })) )}) }))) }))将项目列表细分为多个列表。当我为50000个简单整数运行此方法时，它需要 59.862秒。

import html2text
h = html2text.HTML2Text()
h.body_width = 0
note = h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")

所以我尝试提升它，并用一个简单的循环编写它，迭代我列表中的每个项目，它只需 4毫秒：

System.Linq

这是我使用Linq的Subdivide方法：

Stopwatch watchresult0 = new Stopwatch();
watchresult0.Start();
var result0 = SubDivideListLinq(Enumerable.Range(0, 50000), 100).ToList();
watchresult0.Stop();
long elapsedresult0 = watchresult0.ElapsedMilliseconds;

这是我的Subdivide方法，每个项目都有Stopwatch watchresult1 = new Stopwatch(); watchresult1.Start(); var result1 = SubDivideList(Enumerable.Range(0, 50000), 100).ToList(); watchresult1.Stop(); long elapsedresult1 = watchresult1.ElapsedMilliseconds;循环：

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    while (enumerable.Any())
    {
        yield return enumerable.Take(count).ToList();
        enumerable = enumerable.Skip(count);
    }
}

你有任何想法，为什么我自己的实现比分割Linq快得多？或者我做错了什么？

并且：正如您所看到的，我知道如何拆分列表，因此不是相关问题的重复。我想知道linq和我的实现之间的性能。不是如何拆分列表

Answer 1

如果有人来到这里，也有同样的问题：

最后我做了一些研究，发现System.Linq的多次枚举是性能的原因：

当我将它枚举到一个数组时，为了避免多次枚举，性能变得更好（14 ms / 50k项）：

T[] allItems = enumerable as T[] ?? enumerable.ToArray();
while (allItems.Any())
{
    yield return allItems.Take(count);
    allItems = allItems.Skip(count).ToArray();
}

尽管如此，我还是不会使用linq方法，因为它的速度较慢。相反，我写了一个扩展方法来细分我的列表，50k项需要3ms：

public static class EnumerableExtensions
{
    public static IEnumerable<List<T>> Subdivide<T>(this IEnumerable<T> enumerable, int count)
    {

        List<T> items = new List<T>(count);
        int index = 0;
        foreach (T item in enumerable)
        {
            items.Add(item);
            index++;
            if (index != count) continue;
            yield return items;
            items = new List<T>(count);
            index = 0;
        }
        if (index != 0 && items.Any())
            yield return items;
    }
}

就像@AndreasNiedermair已经写过的那样，它也包含在MoreLinq - 库中，名为Batch。（但我现在不会为这一种方法添加库）

Answer 2

如果您追求可读性和性能您可能希望使用此算法。在速度方面，这个非常接近你的非linq版本。同时它更具可读性。

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    int index = 0;
    return enumerable.GroupBy(l => index++/count).Select(l => l.ToList());
}

其替代方案：

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    int index = 0;
    return from l in enumerable
        group l by index++/count
        into l select l.ToList();
}

另一种选择：

private static IEnumerable<List<T>> SubDivideListLinq<T>(IEnumerable<T> enumerable, int count)
{
    int index = 0;
    return enumerable.GroupBy(l => index++/count, 
                             item => item, 
                             (key,result) => result.ToList());
}

在我的计算机中，我得到linq 0.006 sec与non-linq 0.002 sec，这对于使用linq是完全公平和可接受的。

作为建议，不要用微优化代码折磨自己。显然没有人会感觉到几毫秒的差异，所以写一个代码，以后你和其他人可以轻松理解。

将列表细分为多个列表时System.Linq的性能问题

2 个答案: