对IEnumerable <t>结果进行分区</t>

时间:2014-01-13 03:04:31

标签: c# loops

我正在使用一个具有长处理且返回许多结果的方法,但正确的结果可能是任何返回的结果,比方说300,000个结果但其余的700,000 是否正确返回在以下代码中检查:

//a that suppose to return a value at need.
//Main func might need few returns and not all so 
static IEnumerable<int> foo() {
    //long recusive process, might contain over 1 million results if being asked to yield all.
    yield return ret;
}

static void Main(string[] args) {
    var a = foo();
    while (true) {
        var p = a.Take(300); //takes first 300 every loop in the while-loop
        foreach (var c in p) {
            //does something with it        
            if (bar == true) //if it is the right one:
                goto _break;            
        }
    }
    _break:
    Console.Read(); //pause
}

不幸的是,代码会一次又一次地重新计算300次返回。

我的问题

我怎么可能每次只抽取300个结果而不必再次从头开始(使用Skip(n)然后Take(n))并且不将其转换为Collection显然,IEnumerable结构保留在函数foo中。

我想要做什么?

在我开始使用yield方法之前,我有一个线性无效的程序,结果比新程序更快。除了将foo()的内容分离到外部方法之外,没有什么真正改变,所以我可以逐个产生结果,而不是先将它们全部放在一起,然后再处理。 然而,表现非常可怕。我说的是从300ms到700ms。 我注意到在询问所有结果(foo().ToArray())时,它甚至比使用yield return来检查是否bar == true更快。

所以我想要做的是采取300->采样它们,如果没有找到 - >继续采取300's直到找到。

说明代码

static void Main(string[] args) {
    var a = loly();
    while(true){
        var p = a.Take(3);
        foreach (var c in p) {
            Console.Write(c);
            if (c==4)
                goto _break;
        }
    }

    _break:
    Console.Read();
}

static IEnumerable<int> loly() {
    var l = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    for (int i = 0; i < 9; i++) {
        yield return l[i];
    }            
} 

此输出:123123123等等

答案付诸实践

class Program {
    static void Main(string[] args) {
        var j = 0;
        var a = new EnumerationPartitioner<int>(loly().GetEnumerator());
        while(true) {
            foreach (var c in a.Pull(3)) {
                Console.WriteLine(c);
                Console.WriteLine("("+(++j)+")");
            }
            if (a.Ended)
                break;
        }

        foreach (var part in loly().ToInMemoryBatches(7)) {
            foreach (var c in part) {
                Console.WriteLine(c);
                Console.WriteLine("("+(++j)+")");
            }
        }



        Console.Read();
    }

    static IEnumerable<int> loly() {
        var l = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
        for (int i = 0; i < 9; i++) {
            yield return l[i];
        }            
    } 
}

//Tallseth's method
public static class EnumerationPartitioner {
    public static IEnumerable<IEnumerable<T>> ToInMemoryBatches<T>(this IEnumerable<T> source, int batchSize) {
        List<T> batch = null;
        foreach (var item in source)
        {
            if (batch == null)
                batch = new List<T>();

            batch.Add(item);

            if (batch.Count != batchSize)
                continue;

            yield return batch;
            batch = null;
        }

        if (batch != null)
            yield return batch;
    }
}

//MarcinJuraszek's method
public class EnumerationPartitioner<T> : IEnumerable<T> {

    /// <summary>
    /// Has the enumeration ended?
    /// </summary>
    public bool Ended {
        get { return over; }
    }

    public IEnumerator<T> Enumerator { get; private set; }

    public EnumerationPartitioner(IEnumerator<T> _enum) {
        Enumerator = _enum;
    }

    /// <summary>
    /// Has the enumeration ended
    /// </summary>
    private bool over = false;

    /// <summary>
    /// Items that were pulled from the <see cref="Enumerator"/>
    /// </summary>
    private int n = 0;

    /// <summary>
    /// Pulls <paramref name="count"/> items out of the <see cref="Enumerator"/>.
    /// </summary>
    /// <param name="count">Number of items to pull out the <see cref="Enumerator"/></param>
    public List<T> Pull(int count) {
        var l = new List<T>();
        if (over) return l;
        for (int i = 0; i < count; i++, n++) {
            if ((Enumerator.MoveNext()) == false) {
                over = true;
                return l;
            }
            l.Add(Enumerator.Current);
        }
        return l;
    }

    /// <summary>
    /// Resets the Enumerator and clears internal counters, use this over manual reset
    /// </summary>
    public void Reset() {
        n = 0;
        over = false;
        Enumerator.Reset();
    }


    public IEnumerator<T> GetEnumerator() {
        return Enumerator;
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
        return Enumerator;
    }
}

3 个答案:

答案 0 :(得分:3)

我需要定期这样做。正如阿列克谢所暗示的那样,在处理这种形状的问题时,我想要的是可以容纳的数量。

    public static IEnumerable<IEnumerable<T>> ToInMemoryBatches<T>(this IEnumerable<T> source, int batchSize)
    {
        List<T> batch = null;
        foreach (var item in source)
        {
            if (batch == null)
                batch = new List<T>();

            batch.Add(item);

            if (batch.Count != batchSize)
                continue;

            yield return batch;
            batch = null;
        }

        if (batch != null)
            yield return batch;
    }

答案 1 :(得分:2)

您可以直接使用枚举器,而不是依赖foreach循环:

static void Main(string[] args)
{
    var a = loly();
    var partitionSize = 3;

    using (var enumerator = a.GetEnumerator())
    {
        var values = new List<int>(partitionSize);
        for (int i = 0; i < 3; i++)
        {
            values.Clear();
            for (int j = 0; j < partitionSize && enumerator.MoveNext(); j++)
            {
                values.Add(enumerator.Current);
            }

            foreach (var c in values)
            {
                Console.Write(c);
            }
        }
    }

    Console.Read();
}

答案 2 :(得分:0)

我做了两个方法,区别在于分区大小不固定,一个是使用分区大小和其他分区结束索引,如果最后一个分区未满,也会调整大小。

    public static IEnumerable<T[]> PartitionBySize<T>(this IEnumerable<T> source, int[] sizes)
    {
        using (var iter = source.GetEnumerator())
            foreach (var size in sizes)
                if (iter.MoveNext())
                {
                    var chunk = new T[size];
                    chunk[0] = iter.Current;
                    int i = 1;
                    for (; i < size && iter.MoveNext(); i++)
                        chunk[i] = iter.Current;
                    if (i < size)
                        Array.Resize(ref chunk, i);
                    yield return chunk;
                }
                else
                    yield break;
    }

    public static IEnumerable<T[]> PartitionByIdx<T>(this IEnumerable<T> source, int[] indexes)
    {
        int last = -1;
        using (var iter = source.GetEnumerator())
            foreach (var idx in indexes)
            {
                int size = idx - last;
                last = idx;
                if (iter.MoveNext())
                {
                    var chunk = new T[size];
                    chunk[0] = iter.Current;
                    int i = 1;
                    for (; i < size && iter.MoveNext(); i++)
                        chunk[i] = iter.Current;
                    if (i < size)
                        Array.Resize(ref chunk, i);
                    yield return chunk;
                }
                else
                    yield break;
            }
    }