Task Parallel Library mixed with async/await

时间:2017-06-09 12:38:43

标签: c# sql-server task-parallel-library

In a web application, we provide paginated search panels for various database tables in our application. We currently allow users to select individual rows, and via a UI, execute some operation in each selected instance.

For example, a panel of document records offers an ability to delete documents. A user may check 15 checkboxes representing 15 document identifiers, and choose Options > Delete. This works just fine.

I wish to offer the users an option to execute some operation for all rows matching the query used to display the data in the panel.

We may have 5,000 documents matching some search criteria, and wish to allow a user to delete all 5,000. (I understand this example is a bit contrived; let's ignore the 'wisdom' to allowing users to delete documents in bulk!)

Execution of a method for thousands of rows is a long-running operation, so I will queue the operation instead. Consider this an equivalent of Gmail's ability to apply a filter to all email conversations matching some search criteria.

I need to execute a query that will return an unknown number of rows, and for each row, insert a row into a queue (in the code below, the queue is represented by ImportFileQueue).

I coded it as follows:

using (var reader = await source.InvokeDataReaderAsync(operation, parameters))
{
    Parallel.ForEach<IDictionary<string, object>>(reader.Enumerate(), async properties =>
    {
        try
        {
            var instance = new ImportFileQueueObject(User)
            {
                // application tier calculation here; cannot do in SQL
            };
            await instance.SaveAsync();
        }
        catch (System.Exception ex)
        {
            // omitted for brevity
        }
    });
}

When running this in a unit test that wraps the call with a Transaction, I receive a System.Data.SqlClient.SqlException: Transaction context in use by another session. error.

This is easily resolved by either:

  • Change the database call from async to sync, or
  • Removing the Parallel.Foreach, and iterating through the reader in a serial manner.

I opted for this former:

using (var reader = await source.InvokeDataReaderAsync(operation, parameters))
{
    Parallel.ForEach<IDictionary<string, object>>(reader.Enumerate(), properties =>
    {
        try
        {
            var instance = new ImportFileQueueObject(User)
            {
                // Omitted for brevity
            };
            instance.Save();
        }
        catch (System.Exception ex)
        {
            // omitted for brevity
        }
    });
}

My thought process is, in typical use cases:

  • the outer reader will often have thousands of rows
  • the instance.Save() call is "lightweight"; inserting a single row into the db

Two questions:

  1. Is there a reasonable way to use async/await inside the Parallel.Foreach, where the inner code is using SqlConnection (avoiding the TransactionContext error)
  2. If not, given my expected typical use case, is my choice to leverage TPL and forfeit async/await for the single-row saves reasonable

The answer suggested in What is the reason of “Transaction context in use by another session” says:

Avoid multi-threaded data operations if it's possible (no matter loading or saving). E.g. save SELECT/UPDATE/ etc... requests in a single queue and serve them with a single-thread worker;

but I'm trying to minimize total execution time, and figured the Parallel.Foreach was more likely to reduce execution time.

1 个答案:

答案 0 :(得分:1)

It's almost always a bad idea to open a transaction and then wait for I/O while holding it open. You'll get much better performance (and fewer deadlocks) by buffering the data first. If there's more total data than you can easily buffer in memory, buffer it into chunks of a thousand or so rows at a time. Put each of those in a separate transaction if possible.

Whenever you open a transaction, any locks taken remain open until it is committed (and locks get taken whether you want to or not when you're inserting data). Those locks cause other updates or reads without WITH(NOLOCK) to sit and wait until the transaction is committed. In a high-performance system, if you're doing I/O while those locks are held, it's pretty-much guaranteed to cause problems as other callers start an operation and then sit and wait while this operation does I/O outside the transaction.