异步高效地向Azure表存储添加大量数据

时间:2018-11-22 10:09:24

标签: c# azure asynchronous bulkinsert azure-table-storage

我正在尝试优化将数万个Foo插入Azure表的操作。

当前方法如下:

public void AddBulk(IReadOnlyList<Foo> foos)
{
    var parallelOptions = new ParallelOptions() { MaxDegreeOfParallelism = 4 };
    Parallel.ForEach(foos.GroupBy(x => x.QueryingId), parallelOptions, groupedFoos =>
    {
        var threadTable = Table;

        foreach (var chunkedAmounts in groupedFoos.ToList().Chunk(100))
        {
            var batchOperation = new TableBatchOperation();

            foreach (var amount in chunkedAmounts)
            {
                // Echo content off. This further reduces bandwidth usage by turning off the
                // echo of the payload in the response during entity insertion.
                batchOperation.Insert(new FooTableEntity(amount), false);
            }

            // Exponential retry policies are good for batching procedures, background tasks,
            // or non-interactive scenarios. In these scenarios, you can typically allow more
            // time for the service to recover—with a consequently increased chance of the
            // operation eventually succeeding. Attempt delays: ~3s, ~7s, ~15s, ...
            threadTable.ExecuteBatchAsync(batchOperation, new TableRequestOptions()
            {
                RetryPolicy = new ExponentialRetry(TimeSpan.FromMilliseconds(deltaBackoffMilliseconds), maxRetryAttempts),
                MaximumExecutionTime = TimeSpan.FromSeconds(maxExecutionTimeSeconds),
            }, DefaultOperationContext);
        }
    });
}

我已将该方法升级到.NET Core库which do not support sync over async API。因此,我正在重新评估add方法并将其转换为异步。

此方法的作者按用于分区键的ID手动将foos分组,然后将它们手动分成100个批次,然后以4x并行度上传它们。我很惊讶,这会比某些内置Azure操作更好。

将100000行(每个行包括2个Guid,2个字符串,一个时间戳和一个int数据)上载到Azure表存储的最有效方法是什么?

0 个答案:

没有答案