将线程安全集合转换为DataTable的最佳方法是什么?

时间:2016-03-23 20:47:52

标签: c# multithreading

所以这是场景:

我必须获取一组数据,处理它,构建一个对象,然后将这些对象插入数据库。

为了提高性能,我使用并行循环对数据进行多线程处理并将对象存储在CollectionBag列表中。

那部分工作正常。但是,这里的问题是我现在需要获取该列表,将其转换为DataTable对象并将数据插入数据库。它非常难看,我觉得我不是以最好的方式做到这一点(伪下面):

ConcurrentBag<FinalObject> bag = new ConcurrentBag<FinalObject>();

ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = Environment.ProcessorCount;

Parallel.ForEach(allData, parallelOptions, dataObj =>
{   
    .... Process data ....

    bag.Add(theData);

    Thread.Sleep(100);
});

DataTable table = createTable();
foreach(FinalObject moveObj in bag) {
    table.Rows.Add(moveObj.x);
}

3 个答案:

答案 0 :(得分:1)

这是PLINQ的一个很好的候选者(或者Rx - 我将专注于PLINQ,因为它是基类库的一部分)。

IEnumerable<FinalObject> bag = allData
    .AsParallel()
    .WithDegreeOfParallelism(Environment.ProcessorCount)
    .Select(dataObj =>
    {
        FinalObject theData = Process(dataObj);

        Thread.Sleep(100);

        return theData;
    });

DataTable table = createTable();

foreach (FinalObject moveObj in bag)
{
    table.Rows.Add(moveObj.x);
}

实际上,不是通过Thread.Sleep限制循环,而是应该进一步限制最大并行度,直到CPU使用率降低到所需的水平。

免责声明:以下所有内容仅适用于娱乐,但 确实有效。

当然,你总是可以提升一个档次并产生一个全开的异步Parallel.ForEach实现,它允许你并行处理输入并异步地进行限制,而不会阻塞任何线程池线程。

async Task ParallelForEachAsync<TInput, TResult>(IEnumerable<TInput> input,
                                                 int maxDegreeOfParallelism,
                                                 Func<TInput, Task<TResult>> body,
                                                 Action<TResult> onCompleted)
{
    Queue<TInput> queue = new Queue<TInput>(input);

    if (queue.Count == 0) {
        return;
    }

    List<Task<TResult>> tasksInFlight = new List<Task<TResult>>(maxDegreeOfParallelism);

    do
    {
        while (tasksInFlight.Count < maxDegreeOfParallelism && queue.Count != 0)
        {
            TInput item = queue.Dequeue();
            Task<TResult> task = body(item);

            tasksInFlight.Add(task);
        }

        Task<TResult> completedTask = await Task.WhenAny(tasksInFlight).ConfigureAwait(false);

        tasksInFlight.Remove(completedTask);

        TResult result = completedTask.GetAwaiter().GetResult(); // We know the task has completed. No need for await.

        onCompleted(result);
    }
    while (queue.Count != 0 || tasksInFlight.Count != 0);
}

用法(full Fiddle here):

async Task<DataTable> ProcessAllAsync(IEnumerable<InputObject> allData)
{
    DataTable table = CreateTable();
    int maxDegreeOfParallelism = Environment.ProcessorCount;

    await ParallelForEachAsync(
        allData,
        maxDegreeOfParallelism,
        // Loop body: these Tasks will run in parallel, up to {maxDegreeOfParallelism} at any given time.
        async dataObj =>
        {
            FinalObject o = await Task.Run(() => Process(dataObj)).ConfigureAwait(false); // Thread pool processing.

            await Task.Delay(100).ConfigureAwait(false); // Artificial throttling.

            return o;
        },
        // Completion handler: these will be executed one at a time, and can safely mutate shared state.
        moveObj => table.Rows.Add(moveObj.x)
    );

    return table;
}

struct InputObject
{
    public int x;
}

struct FinalObject
{
    public int x;
}

FinalObject Process(InputObject o)
{
    // Simulate synchronous work.
    Thread.Sleep(100);

    return new FinalObject { x = o.x };
}

相同的行为,但没有Thread.SleepConcurrentBag<T>

答案 1 :(得分:0)

我认为这样的事情应该会提供更好的性能,看起来像object []是比DataRow更好的选择,因为你需要DataTable来获取DataRow对象。

ConcurrentBag<object[]> bag = new ConcurrentBag<object[]>();

Parallel.ForEach(allData, 
    new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, 
    dataObj =>
{
    object[] row = new object[colCount];

    //do processing

    bag.Add(row);

    Thread.Sleep(100);
});

DataTable table = createTable();
foreach (object[] row in bag)
{
    table.Rows.Add(row);
}

答案 2 :(得分:0)

通过让所有内容并行运行,听起来很复杂,但如果将DataRow个对象存储在包中而不是普通对象中,最后可以使用{{ 1}}很容易从通用集合中创建DataTableExtensions

DataTable

只需在项目中添加对var dataTable = bag.CopyToDataTable(); 的引用。