Question

请阅读以下编辑以获取更多信息。

我在下面有一些代码，用于在项目属于某种类型时拆分对象的通用列表。

    public static IEnumerable<object>[] Split(this  IEnumerable<object> tokens, TokenType type) {

        List<List<object>> t = new List<List<object>>();
        int currentT = 0;
        t.Add(new List<object>());
        foreach (object list in tokens) {
            if ((list is Token) && (list as Token).TokenType == type) {
                currentT++;
                t.Add(new List<object>());

            }
            else if ((list is TokenType) && ((TokenType)list )== type) {
                currentT++;
                t.Add(new List<object>());

            }
            else {
                t[currentT].Add(list);
            }
        }

        return t.ToArray();
    }

我没有一个明确的问题，因为我很好奇，如果有人知道我可以优化这些代码的任何方法。我把它称之为很多次，就时钟周期来说它似乎是相当野兽。有任何想法吗？如果有人有兴趣，我也可以把它作为Wiki，也许我们可以跟踪最新的变化。

更新：我试图解析出特定的令牌。它是一些其他类和Token类的列表。令牌具有TokenType的属性（枚举）。我需要找到所有令牌类并分别对它们进行拆分。

{a b c T d e T f g h T i j k l T m}

会像

一样分裂

{a b c}{d e}{f g h}{i j k l}{m}

编辑更新：似乎我的所有速度问题都出现在不断创建和添加通用列表中。没有那个，有谁知道我怎么能这样做？如果有人帮助，这就是正在发生的事情。

alt text http://i49.tinypic.com/1zvpcmq.png

Answer 1

~~您的代码看起来很好。~~

我唯一的建议是将IEnumerable<object>替换为非通用IEnumerable。（在System.Collections）

修改：

在进一步检查时，你投的次数超过了必要的次数。

使用以下代码替换if：

var token = list as Token; if (token != null && token.TokenType == type) {

此外，您可以通过撰写currentT或t[t.Count - 1]来删除t.Last()变量。这将使代码更清晰，但可能对性能产生微小的负面影响或者，您可以在变量中存储对内部列表的引用并直接使用它。（这会略微提高性能）

最后，如果您可以将返回类型更改为List<List<Object>>，则可以直接返回t;这将避免数组副本，并且对于大型列表来说明显更快。

顺便说一句，你的变量名称令人困惑;您应该交换t和list的名称。

Answer 2

类型测试和演员表可能是性能杀手。如果可能，您的令牌类型应该实现公共接口或抽象类。你应该传入object来包裹你的对象，而不是传入IToken。

以下是一些可用于入门的概念代码：

using System;
using System.Collections.Generic;

namespace Juliet
{
    interface IToken<T>
    {
        bool IsDelimeter { get; }
        T Data { get; }
    }

    class DelimeterToken<T> : IToken<T>
    {
        public bool IsDelimeter { get { return true; } }
        public T Data { get { throw new Exception("No data"); } }
    }

    class DataToken<T> : IToken<T>
    {
        public DataToken(T data)
        {
            this.Data = data;
        }

        public bool IsDelimeter { get { return false; } }
        public T Data { get; private set; }
    }

    class TokenFactory<T>
    {
        public IToken<T> Make()
        {
            return new DelimeterToken<T>();
        }

        public IToken<T> Make(T data)
        {
            return new DataToken<T>(data);
        }
    }

    class Program
    {

        static List<List<T>> SplitTokens<T>(IEnumerable<IToken<T>> tokens)
        {
            List<List<T>> res = new List<List<T>>();
            foreach (IToken<T> token in tokens)
            {
                if (token.IsDelimeter)
                {
                    res.Add(new List<T>());
                }
                else
                {
                    if (res.Count == 0)
                    {
                        res.Add(new List<T>());
                    }

                    res[res.Count - 1].Add(token.Data);
                }
            }

            return res;
        }

        static void Main(string[] args)
        {
            TokenFactory<string> factory = new TokenFactory<string>();
            IToken<string>[] tokens = new IToken<string>[]
                {
                    factory.Make("a"), factory.Make("b"), factory.Make("c"), factory.Make(),
                    factory.Make("d"), factory.Make("e"), factory.Make(),
                    factory.Make("f"), factory.Make("g"), factory.Make("h"), factory.Make(),
                    factory.Make("i"), factory.Make("j"), factory.Make("k"), factory.Make("l"), factory.Make(),
                    factory.Make("m")
                };

            List<List<string>> splitTokens = SplitTokens(tokens);
            for (int i = 0; i < splitTokens.Count; i++)
            {
                Console.Write("{");
                for (int j = 0; j < splitTokens[i].Count; j++)
                {
                    Console.Write("{0}, ", splitTokens[i][j]);
                }
                Console.Write("}");
            }

            Console.ReadKey(true);
        }
    }
}

原则上，您可以创建IToken<object>的实例，以将其推广到多个类的标记。

Answer 3

答：如果你只是在嵌套的foreach中遍历结果，那么一个全惰性的实现就足够了：

using System;
using System.Collections.Generic;

public static class Splitter
{
    public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, Predicate<T> match)
    {
        using (IEnumerator<T> enumerator = source.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                yield return Split(enumerator, match);
            }
        }
    }

    static IEnumerable<T> Split<T>(IEnumerator<T> enumerator, Predicate<T> match)
    {
        do
        {
            if (match(enumerator.Current))
            {
                yield break;
            }
            else
            {
                yield return enumerator.Current;
            }
        } while (enumerator.MoveNext());
    }
}

像这样使用：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace MyTokenizer
{
    class Program
    {
        enum TokenTypes { SimpleToken, UberToken }

        class Token { public TokenTypes TokenType = TokenTypes.SimpleToken;    }

        class MyUberToken : Token { public MyUberToken() { TokenType = TokenTypes.UberToken; } }

        static void Main(string[] args)
        {
            List<object> objects = new List<object>(new object[] { "A", Guid.NewGuid(), "C", new MyUberToken(), "D", new MyUberToken(), "E", new MyUberToken() });
            var splitOn = TokenTypes.UberToken;
            foreach (var list in objects.Split(x => x is Token && ((Token)x).TokenType == splitOn))
            {
                foreach (var item in list)
                {
                    Console.WriteLine(item);
                }
                Console.WriteLine("==============");
            }
            Console.ReadKey();
        }

    }
}

B：如果你需要稍后处理结果，你希望无序执行，或者你在一个线程上进行分区，然后可能将这些段分配给多个线程，那么这可能会提供一个好处起点：

using System;
using System.Collections.Generic;
using System.Linq;

public static class Splitter2
{
    public static IEnumerable<IEnumerable<T>> SplitToSegments<T>(this IEnumerable<T> source, Predicate<T> match)
    {
        T[] items = source.ToArray();
        for (int startIndex = 0; startIndex < items.Length; startIndex++)
        {
            int endIndex = startIndex;
            for (; endIndex < items.Length; endIndex++)
            {
                if (match(items[endIndex])) break;
            }
            yield return EnumerateArraySegment(items, startIndex, endIndex - 1);
            startIndex = endIndex;
        }
    }

    static IEnumerable<T> EnumerateArraySegment<T>(T[] array, int startIndex, int endIndex)
    {
        for (; startIndex <= endIndex; startIndex++)
        {
            yield return array[startIndex];
        }
    }
}

C：如果你真的必须返回List＆lt; T＆gt;的集合-s - 我怀疑，除非你明确想要稍后改变它们，然后尝试在复制之前将它们初始化为给定的容量：

public static List<List<T>> SplitToLists<T>(this IEnumerable<T> source, Predicate<T> match)
{
    List<List<T>> lists = new List<List<T>>();
    T[] items = source.ToArray();
    for (int startIndex = 0; startIndex < items.Length; startIndex++)
    {
        int endIndex = startIndex;
        for (; endIndex < items.Length; endIndex++)
        {
            if (match(items[endIndex])) break;
        }
        List<T> list = new List<T>(endIndex - startIndex);
        list.AddRange(EnumerateArraySegment(items, startIndex, endIndex - 1));
        lists.Add(list);
        startIndex = endIndex;
    }
    return lists;
}

D：如果这仍然不够，我建议您滚动自己的轻量级List实现，它可以将范围从另一个实例直接复制到其内部数组。

Answer 4

我的第一个想法是，不是一直查找t[currentT]，只需存储currentList并直接添加到其中。

Answer 5

我认为假设列表项是小写字母，而且匹配标记类型的项是T：

{T a b c ...};
{... x y z T};
{... j k l T T m n o ...};
{T};和
{}

这将导致：

{{} {a b c ...}};
{{... x y z} {}};
{{... j k l} {} {} {m n o ...}};
{{}};和
{}

进行直接重构：

public static IEnumerable<object>[] Split(this IEnumerable<object> tokens,
                                          TokenType type) {
    var outer = new List<List<object>>();
    var inner = new List<object>();
    foreach (var item in tokens) {
        Token token = item as token;
        if (token != null && token.TokenType == type) {
            outer.Add(inner);
            inner = new List<object>();
            continue;
        }
        inner.Add(item);
    }
    outer.Add(inner);
    return outer.ToArray();
}

为了修复破损的案例（假设这些案件真的坏了），我建议：

public static IEnumerable<object>[] Split(this IEnumerable<object> tokens,
                                          TokenType type) {
    var outer = new List<List<object>>();
    var inner = new List<object>();
    var enumerator = tokens.GetEnumerator();
    while (enumerator.MoveNext()) {
        Token token = enumerator.Current as token;
        if (token == null || token.TokenType != type) {
            inner.Add(enumerator.Current);
        }
        else if (inner.Count > 0) {
            outer.Add(inner);
            inner = new List<object>();
        }
    }
    return outer.ToArray();
}

这将导致：

{{a b c ...}};
{{... x y z}};
{{... j k l} {m n o ...}};
{};和
{}

Answer 6

使用LINQ你可以试试这个:(我没有测试它......）

    public static IEnumerable<object>[] Split(this  IEnumerable<object> tokens, TokenType type)
    {
        List<List<object>> l = new List<List<object>>();
        l.Add(new List<object>());
        return tokens.Aggregate(l, (c, n) => 
        {
            var t = n as Token;
            if (t != null && t.TokenType == type)
            {
                t.Add(new List<object>());
            }
            else
            {
                l.Last().Add(n);
            }
            return t;
        }).ToArray();
    }

第二次尝试：

public static IEnumerable<object>[] Split(this  IEnumerable<object> tokens, TokenType type)
{
    var indexes = tokens.Select((t, index) => new { token = t, index = index }).OfType<Token>().Where(t => t.token.TokenType == type).Select(t => t.index);
    int prevIndex = 0;
    foreach (int item in indexes)
    {
        yield return tokens.Where((t, index) => (index > prevIndex && index < item));
        prevIndex = item;
    }
}

Answer 7

这是一种可能性

Token类（可能是什么类）

public class Token
{
    public string Name { get; set; }
    public TokenType TokenType { get; set; }
}

现在是类型枚举（这可能是其他任何分组因素）

public enum  TokenType
{
    Type1,
    Type2,
    Type3,
    Type4,
    Type5,
}

扩展方法（无论如何你都选择声明）

public static class TokenExtension
{
    public static IEnumerable<Token>[] Split(this IEnumerable<Token> tokens)
    {
        return tokens.GroupBy(token => ((Token)token).TokenType).ToArray();
    }
}

使用样本（我使用了一个Web项目来旋转它）

List<Token> tokens = new List<Token>();
        tokens.Add(new Token { Name = "a", TokenType = TokenType.Type1 });
        tokens.Add(new Token { Name = "b", TokenType = TokenType.Type1 });
        tokens.Add(new Token { Name = "c", TokenType = TokenType.Type1 });

        tokens.Add(new Token { Name = "d", TokenType = TokenType.Type2 });
        tokens.Add(new Token { Name = "e", TokenType = TokenType.Type2  });

        tokens.Add(new Token { Name = "f", TokenType = TokenType.Type3 });
        tokens.Add(new Token { Name = "g", TokenType = TokenType.Type3 });
        tokens.Add(new Token { Name = "h", TokenType = TokenType.Type3 });

        tokens.Add(new Token { Name = "i", TokenType = TokenType.Type4 });
        tokens.Add(new Token { Name = "j", TokenType = TokenType.Type4 });
        tokens.Add(new Token { Name = "k", TokenType = TokenType.Type4 });
        tokens.Add(new Token { Name = "l", TokenType = TokenType.Type4 });

        tokens.Add(new Token { Name = "m", TokenType = TokenType.Type5 });

        StringBuilder stringed = new StringBuilder();

        foreach (Token token in tokens)
        {
            stringed.Append(token.Name);
            stringed.Append(", ");
        }

        Response.Write(stringed.ToString());
        Response.Write("</br>");


        var q = tokens.Split();
        foreach (var list in tokens.Split())
        {
            stringed = new StringBuilder();
            foreach (Token token in list)
            {
                stringed.Append(token.Name);
                stringed.Append(", ");
            }
            Response.Write(stringed.ToString());
            Response.Write("</br>");
        }

所以我所有的东西都在使用Linq，随意添加或删除，你可以实际上对此疯狂，并在许多不同的属性上分组。

Answer 8

这是我能尽力消除函数的尽可能多的分配时间（应该只在它超过容量时分配，这应该不超过创建最大子列表所需的时间）结果）。我已经测试了这个实现，它就像你描述的那样工作。

请注意，当访问组中的下一个列表时，前一个子列表的结果将被销毁。

public static IEnumerable<IEnumerable> Split(this  IEnumerable tokens, TokenType type)
{
    ArrayList currentT = new ArrayList();
    foreach (object list in tokens)
    {
        Token token = list as Token;
        if ((token != null) && token.TokenType == type)
        {
            yield return currentT;
            currentT.Clear();
            //currentT = new ArrayList(); <-- Use this instead of 'currentT.Clear();' if you want the returned lists to be a different instance
        }
        else if ((list is TokenType) && ((TokenType)list) == type)
        {
            yield return currentT;
            currentT.Clear();
            //currentT = new ArrayList(); <-- Use this instead of 'currentT.Clear();' if you want the returned lists to be a different instance
        }
        else
        {
            currentT.Add(list);
        }
    }
}

修改这是另一个根本没有使用其他列表的版本（不应该进行任何分配）。不确定这会有多好比较，但它确实可以按要求工作（如果你试图缓存一个子'数组'，我也不知道这个会怎样）。

此外，这两个都需要“using System.Collections”语句（除了Generic命名空间）。

private static IEnumerable SplitInnerLoop(IEnumerator iter, TokenType type) { do { Token token = iter.Current as Token; if ((token != null) && token.TokenType == type) { break; } else if ((iter.Current is TokenType) && ((TokenType)iter.Current) == type) { break; } else { yield return iter.Current; } } while (iter.MoveNext()); } public static IEnumerable<IEnumerable> Split(this IEnumerable tokens, TokenType type) { IEnumerator iter = tokens.GetEnumerator(); while (iter.MoveNext()) { yield return SplitInnerLoop(iter, type); } }

Answer 9

您需要将其转换为数组吗？您可能会使用LINQ和延迟执行来返回结果。

修改
有了澄清的问题，很难弯曲LINQ使其以您想要的方式返回结果。如果您仍希望延迟每个周期的执行，您可以编写自己的枚举器。

如果您尝试这种方法，我建议对其他选项进行性能测试，以查看您的方案是否有性能提升。它可能会导致更多的管理迭代器的开销，这对于数据很少的情况会很糟糕。

我希望这会有所帮助。

// This is the easy way to make your own iterator using the C# syntax // It will return sets of separated tokens in a lazy fashion // This sample is based on the version provided by @Ants public static IEnumerable<IEnumerable<object>> Split(this IEnumerable<object> tokens, TokenType type) { var current = new List<object>(); foreach (var item in tokens) { Token token = item as Token; if (token != null && token.TokenType == type) { if( current.Count > 0) { yield return current; current = new List<object>(); } } else { current.Add(item); } } if( current.Count > 0) yield return current; }

警告：此编译但仍可能隐藏错误。现在已经很晚了。

// This is doing the same thing but doing it all by hand. // You could use this method as well to lazily iterate through the 'current' list as well // This is probably overkill and substantially more complex public class TokenSplitter : IEnumerable<IEnumerable<object>>, IEnumerator<IEnumerable<object>> { IEnumerator<object> _enumerator; IEnumerable<object> _tokens; TokenType _target; List<object> _current; bool _isDone = false; public TokenSplitter(IEnumerable<object> tokens, TokenType target) { _tokens = tokens; _target = target; Reset(); } // Cruft from the IEnumerable and generic IEnumerator public IEnumerator<IEnumerable<object>> GetEnumerator() { return this; } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); } public IEnumerable<object> Current { get { return _current; } } public void Dispose() { } #region IEnumerator Members object System.Collections.IEnumerator.Current { get { return Current; } } // See if there is anything left to get public bool MoveNext() { if (_isDone) return false; FillCurrent(); return !_isDone; } // Reset the enumerators so that you could reuse this structure if you wanted public void Reset() { _isDone = false; _enumerator = _tokens.GetEnumerator(); _current = new List<object>(); FillCurrent(); } // Fills the current set of token and then begins the next set private void FillCurrent() { // Try to accumulate as many tokens as possible, this too could be an enumerator to delay the process more bool hasNext = _enumerator.MoveNext(); for( ; hasNext; hasNext = _enumerator.MoveNext()) { Token token = _enumerator.Current as Token; if (token == null || token.TokenType != _target) { _current.Add(_enumerator.Current); } else { _current = new List<object>(); } } // Continue removing matching tokens and begin creating the next set for( ; hasNext; hasNext = _enumerator.MoveNext()) { Token token = _enumerator.Current as Token; if (token == null || token.TokenType != _target) { _current.Add(_enumerator.Current); break; } } _isDone = !hasNext; } #endregion }

优化的通用列表拆分

9 个答案: