我有一个可重启的程序,它运行在一个非常大的空间,我已经开始并行化了一些。每个任务独立运行并使用其结果更新数据库。如果重复任务(它们基于输入数组是完全确定的并且将简单地生成它们之前所做的相同结果)并不重要,但这样做效率相对较低。到目前为止,我已经提出了以下模式:
static void Main(string[] args) {
GeneratorStart = Storage.Load();
var tasks = new List<Task>();
foreach (int[] temp in Generator()) {
var arr = temp;
var task = new Task(() => {
//... use arr as needed
});
task.Start();
tasks.Add(task);
if (tasks.Count > 4) {
Task.WaitAll(tasks.ToArray());
Storage.UpdateStart(temp);
tasks = new List<Task>();
}
}
}
在使发生器重新启动之前,我有一个简单的Parallel.Foreach循环,并且速度更快。我想我正在使用WaitAll操作丢失一些CPU时间。我怎样才能摆脱这个瓶颈,同时跟踪我重新启动时不必再运行的任务?
有关人员的其他位(为简洁起见而缩短):
class Program {
static bool Done = false;
static int[] GeneratorStart = null;
static IEnumerable<int[]> Generator() {
var s = new Stack<int>();
//... omitted code to initialize stack to GeneratorStart for brevity
yield return s.ToArray();
while (!Done) {
Increment(s);
yield return s.Reverse().ToArray();
}
}
static int Base = 25600; //example number (none of this is important
static void Increment(Stack<int> stack) { //outside the fact
if (stack.Count == 0) { //that it is generating an array
stack.Push(1); //of a large base
return; //behaving like an integer
} //with each digit stored in an
int i = stack.Pop(); //array position)
i++;
if (i < Base) {
stack.Push(i);
return;
}
Increment(stack);
stack.Push(0);
}
}
答案 0 :(得分:0)
我想出了这个:
var tasks = new Queue<Pair<int[],Task>>();
foreach (var temp in Generator()) {
var arr = temp;
tasks.Enqueue(new Pair<int[], Task>(arr, Task.Run(() ={
//... use arr as needed
}));
var tArray = t.Select(v => v.Value).Where(t=>!t.IsCompleted).ToArray();
if (tArray.Length > 7) {
Task.WaitAny(tArray);
var first = tasks.Peek();
while (first != null && first.B.IsCompleted) {
Storage.UpdateStart(first.A);
tasks.Dequeue();
first = tasks.Count == 0 ? null : tasks.Peek();
}
}
}
...
class Pair<TA,TB> {
public TA A { get; set; }
public TB B { get; set; }
public Pair(TA a, TB b) { A = a; B = b; }
}