所以这是场景:
我必须获取一组数据,处理它,构建一个对象,然后将这些对象插入数据库。
为了提高性能,我使用并行循环对数据进行多线程处理并将对象存储在CollectionBag列表中。
那部分工作正常。但是,这里的问题是我现在需要获取该列表,将其转换为DataTable对象并将数据插入数据库。它非常难看,我觉得我不是以最好的方式做到这一点(伪下面):
ConcurrentBag<FinalObject> bag = new ConcurrentBag<FinalObject>();
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(allData, parallelOptions, dataObj =>
{
.... Process data ....
bag.Add(theData);
Thread.Sleep(100);
});
DataTable table = createTable();
foreach(FinalObject moveObj in bag) {
table.Rows.Add(moveObj.x);
}
答案 0 :(得分:1)
这是PLINQ的一个很好的候选者(或者Rx - 我将专注于PLINQ,因为它是基类库的一部分)。
IEnumerable<FinalObject> bag = allData
.AsParallel()
.WithDegreeOfParallelism(Environment.ProcessorCount)
.Select(dataObj =>
{
FinalObject theData = Process(dataObj);
Thread.Sleep(100);
return theData;
});
DataTable table = createTable();
foreach (FinalObject moveObj in bag)
{
table.Rows.Add(moveObj.x);
}
实际上,不是通过Thread.Sleep
限制循环,而是应该进一步限制最大并行度,直到CPU使用率降低到所需的水平。
免责声明:以下所有内容仅适用于娱乐,但 确实有效。
当然,你总是可以提升一个档次并产生一个全开的异步Parallel.ForEach
实现,它允许你并行处理输入并异步地进行限制,而不会阻塞任何线程池线程。
async Task ParallelForEachAsync<TInput, TResult>(IEnumerable<TInput> input,
int maxDegreeOfParallelism,
Func<TInput, Task<TResult>> body,
Action<TResult> onCompleted)
{
Queue<TInput> queue = new Queue<TInput>(input);
if (queue.Count == 0) {
return;
}
List<Task<TResult>> tasksInFlight = new List<Task<TResult>>(maxDegreeOfParallelism);
do
{
while (tasksInFlight.Count < maxDegreeOfParallelism && queue.Count != 0)
{
TInput item = queue.Dequeue();
Task<TResult> task = body(item);
tasksInFlight.Add(task);
}
Task<TResult> completedTask = await Task.WhenAny(tasksInFlight).ConfigureAwait(false);
tasksInFlight.Remove(completedTask);
TResult result = completedTask.GetAwaiter().GetResult(); // We know the task has completed. No need for await.
onCompleted(result);
}
while (queue.Count != 0 || tasksInFlight.Count != 0);
}
用法(full Fiddle here):
async Task<DataTable> ProcessAllAsync(IEnumerable<InputObject> allData)
{
DataTable table = CreateTable();
int maxDegreeOfParallelism = Environment.ProcessorCount;
await ParallelForEachAsync(
allData,
maxDegreeOfParallelism,
// Loop body: these Tasks will run in parallel, up to {maxDegreeOfParallelism} at any given time.
async dataObj =>
{
FinalObject o = await Task.Run(() => Process(dataObj)).ConfigureAwait(false); // Thread pool processing.
await Task.Delay(100).ConfigureAwait(false); // Artificial throttling.
return o;
},
// Completion handler: these will be executed one at a time, and can safely mutate shared state.
moveObj => table.Rows.Add(moveObj.x)
);
return table;
}
struct InputObject
{
public int x;
}
struct FinalObject
{
public int x;
}
FinalObject Process(InputObject o)
{
// Simulate synchronous work.
Thread.Sleep(100);
return new FinalObject { x = o.x };
}
相同的行为,但没有Thread.Sleep
和ConcurrentBag<T>
。
答案 1 :(得分:0)
我认为这样的事情应该会提供更好的性能,看起来像object []是比DataRow更好的选择,因为你需要DataTable来获取DataRow对象。
ConcurrentBag<object[]> bag = new ConcurrentBag<object[]>();
Parallel.ForEach(allData,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
dataObj =>
{
object[] row = new object[colCount];
//do processing
bag.Add(row);
Thread.Sleep(100);
});
DataTable table = createTable();
foreach (object[] row in bag)
{
table.Rows.Add(row);
}
答案 2 :(得分:0)
通过让所有内容并行运行,听起来很复杂,但如果将DataRow
个对象存储在包中而不是普通对象中,最后可以使用{{ 1}}很容易从通用集合中创建DataTableExtensions
:
DataTable
只需在项目中添加对var dataTable = bag.CopyToDataTable();
的引用。