我正在使用第三方库来迭代一些非常大的平面文件,这可能需要很长时间。该库提供了一个枚举器,因此您可以生成每个结果并对其进行处理,同时枚举器将提取平面文件中的下一个项目。
例如:
IEnumerable<object> GetItems()
{
var cursor = new Cursor;
try
{
cursor.Open();
while (!cursor.EOF)
{
yield return new //object;
cursor.MoveNext();
}
}
finally
{
if (cursor.IsOpen)
{
cursor.Close();
}
}
}
我想要实现的是拥有两个相同Enumerable的消费者,所以我不必提取信息两次,因此每个消费者仍然可以处理每个项目,因为它到达时不必等待所有时间马上到达。
IEnumerable<object> items = GetItems();
new Thread(SaveToDateBase(items)).Start();
new Thread(SaveSomewhereElse(items)).Start();
我想我想要实现的是
“如果消费者要求的项目已被提取然后产生它,否则移动下一步并等待”但我意识到两个线程之间可能存在MoveNext()冲突。
如果没有任何关于如何实现的想法,这样的事情是否已经存在?
由于
答案 0 :(得分:5)
Pipelines pattern implementation使用.NET 4 BlockingCollection<T>
和TPL Tasks就是您正在寻找的。请参阅我的回答,完整示例in this StackOverflow post。
示例:3个同时消费者
BlockingCollection<string> queue = new BlockingCollection<string>();
public void Start()
{
var producerWorker = Task.Factory.StartNew(() => ProducerImpl());
var consumer1 = Task.Factory.StartNew(() => ConsumerImpl());
var consumer2 = Task.Factory.StartNew(() => ConsumerImpl());
var consumer3 = Task.Factory.StartNew(() => ConsumerImpl());
Task.WaitAll(producerWorker, consumer1, consumer2, consumer3);
}
private void ProducerImpl()
{
// 1. Read a raw data from a file
// 2. Preprocess it
// 3. Add item to a queue
queue.Add(item);
}
// ConsumerImpl must be thrad safe
// to allow launching multiple consumers simulteniously
private void ConsumerImpl()
{
foreach (var item in queue.GetConsumingEnumerable())
{
// TODO
}
}
如果仍然不清楚,请告诉我。
管道流程的高级图表:
答案 1 :(得分:3)
基本上你想要的是缓存一个IEnumerable<T>
的数据,但是在存储之前不必等待它完成。你可以这样做:
public static IEnumerable<T> Cache<T>(this IEnumerable<T> source)
{
return new CacheEnumerator<T>(source);
}
private class CacheEnumerator<T> : IEnumerable<T>
{
private CacheEntry<T> cacheEntry;
public CacheEnumerator(IEnumerable<T> sequence)
{
cacheEntry = new CacheEntry<T>();
cacheEntry.Sequence = sequence.GetEnumerator();
cacheEntry.CachedValues = new List<T>();
}
public IEnumerator<T> GetEnumerator()
{
if (cacheEntry.FullyPopulated)
{
return cacheEntry.CachedValues.GetEnumerator();
}
else
{
return iterateSequence<T>(cacheEntry).GetEnumerator();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
private static IEnumerable<T> iterateSequence<T>(CacheEntry<T> entry)
{
for (int i = 0; entry.ensureItemAt(i); i++)
{
yield return entry.CachedValues[i];
}
}
private class CacheEntry<T>
{
public bool FullyPopulated { get; private set; }
public IEnumerator<T> Sequence { get; set; }
//storing it as object, but the underlying objects will be lists of various generic types.
public List<T> CachedValues { get; set; }
private static object key = new object();
/// <summary>
/// Ensure that the cache has an item a the provided index. If not, take an item from the
/// input sequence and move to the cache.
///
/// The method is thread safe.
/// </summary>
/// <returns>True if the cache already had enough items or
/// an item was moved to the cache,
/// false if there were no more items in the sequence.</returns>
public bool ensureItemAt(int index)
{
//if the cache already has the items we don't need to lock to know we
//can get it
if (index < CachedValues.Count)
return true;
//if we're done there's no race conditions hwere either
if (FullyPopulated)
return false;
lock (key)
{
//re-check the early-exit conditions in case they changed while we were
//waiting on the lock.
//we already have the cached item
if (index < CachedValues.Count)
return true;
//we don't have the cached item and there are no uncached items
if (FullyPopulated)
return false;
//we actually need to get the next item from the sequence.
if (Sequence.MoveNext())
{
CachedValues.Add(Sequence.Current);
return true;
}
else
{
Sequence.Dispose();
FullyPopulated = true;
return false;
}
}
}
}
使用示例:
private static IEnumerable<int> interestingIntGenertionMethod(int maxValue)
{
for (int i = 0; i < maxValue; i++)
{
Thread.Sleep(1000);
Console.WriteLine("actually generating value: {0}", i);
yield return i;
}
}
public static void Main(string[] args)
{
IEnumerable<int> sequence = interestingIntGenertionMethod(10)
.Cache();
int numThreads = 3;
for (int i = 0; i < numThreads; i++)
{
int taskID = i;
Task.Factory.StartNew(() =>
{
foreach (int value in sequence)
{
Console.WriteLine("Task: {0} Value:{1}",
taskID, value);
}
});
}
Console.WriteLine("Press any key to exit...");
Console.ReadKey(true);
}