我有一个大的列表循环(1.500.000项),每个项目我都要做一个非常小的检查。完全在30秒内。
使用Sequential时的CPU利用率约为10%,因此没有使用大量资源。
第一个想法是使用Parallel,但是由于每个项目的持续时间有限,Parallel比持续的Foreach持续时间更长,这是由于“Why was the parallel version slower than the sequential version in this example?”,这解释了每个任务的创建会耗费时间。
所以我有另一个想法,那就是将列表分成4个(或更多)相等的和平,并创建一个线程来遍历项目以使其更快。
在创建自己的课程之前,这是一个好方法吗?或者关于如何加快速度的任何其他想法?或者你知道更好的处理方法吗?
我为另一个并行方法创建的代码:(在我自己的静态类中使用)
public static void ForEach<T>(IEnumerable<T> list, Action<T> body, int listDevide)
{
// Number of items
int items = list.Count();
// Divided (in int, so floored)
int listPart = items / listDevide;
// Get numbers extra for last run
int rest = items % listDevide;
// List to save the actions
var actions = new List<Action>();
for(var x = 0; x < listDevide; x++)
{
// Create the actions
actions.Add(delegate {
foreach(var item in list.Skip(x * listPart).Take(listPart))
{
body.Invoke(item);
}
});
}
// Run the actions parallel
Parallel.Invoke(actions.ToArray());
}
备注:此示例中当前未使用“rest”变量来执行最后一项。
下面的解决方案,更多信息:http://msdn.microsoft.com/en-us/library/dd997411.aspx
答案 0 :(得分:6)
是的,对输入数组进行分区是一种很好的方法。
事实上,Microsoft提供了Partitioner
类来帮助完成这种方法。
以下是一个展示如何操作的示例:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
namespace Demo
{
class Program
{
private void run()
{
double sum = 0;
Func<double, double> func = x => Math.Sqrt(Math.Sin(x));
object locker = new object();
double[] data = testData();
// For each double in data[] we are going to calculate Math.Sqrt(Math.Sin(x)) and
// add all the results together.
//
// To do this, we use class Partitioner to split the input array into just a few partitions,
// (the Partitioner will use knowledge about the number of processor cores to optimize this)
// and then add up all the values using a separate thread for each partition.
//
// We use threadLocalState to compute the total for each partition, and then we have to
// add all these together to get the final sum. We must lock the additon because it isn't
// threadsafe, and several threads could be doing it at the same time.
Parallel.ForEach
(
Partitioner.Create(0, data.Length),
() => 0.0,
(subRange, loopState, threadLocalState) =>
{
for (int i = subRange.Item1; i < subRange.Item2; i++)
{
threadLocalState += func(data[i]);
}
return threadLocalState;
},
finalThreadLocalState =>
{
lock (locker)
{
sum += finalThreadLocalState;
}
}
);
Console.WriteLine("Sum = " + sum);
}
private static double[] testData()
{
double[] array = new double[1000003]; // Test with an odd number of values.
Random rng = new Random(12345);
for (int i = 0; i < array.Length; ++i)
array[i] = rng.Next() & 3; // Don't want large values for this simple test.
return array;
}
static void Main()
{
new Program().run();
}
}
}