Question

我正在尝试优化将数万个Foo插入Azure表的操作。

当前方法如下：

public void AddBulk(IReadOnlyList<Foo> foos)
{
    var parallelOptions = new ParallelOptions() { MaxDegreeOfParallelism = 4 };
    Parallel.ForEach(foos.GroupBy(x => x.QueryingId), parallelOptions, groupedFoos =>
    {
        var threadTable = Table;

        foreach (var chunkedAmounts in groupedFoos.ToList().Chunk(100))
        {
            var batchOperation = new TableBatchOperation();

            foreach (var amount in chunkedAmounts)
            {
                // Echo content off. This further reduces bandwidth usage by turning off the
                // echo of the payload in the response during entity insertion.
                batchOperation.Insert(new FooTableEntity(amount), false);
            }

            // Exponential retry policies are good for batching procedures, background tasks,
            // or non-interactive scenarios. In these scenarios, you can typically allow more
            // time for the service to recover—with a consequently increased chance of the
            // operation eventually succeeding. Attempt delays: ~3s, ~7s, ~15s, ...
            threadTable.ExecuteBatchAsync(batchOperation, new TableRequestOptions()
            {
                RetryPolicy = new ExponentialRetry(TimeSpan.FromMilliseconds(deltaBackoffMilliseconds), maxRetryAttempts),
                MaximumExecutionTime = TimeSpan.FromSeconds(maxExecutionTimeSeconds),
            }, DefaultOperationContext);
        }
    });
}

我已将该方法升级到.NET Core库which do not support sync over async API。因此，我正在重新评估add方法并将其转换为异步。

此方法的作者按用于分区键的ID手动将foos分组，然后将它们手动分成100个批次，然后以4x并行度上传它们。我很惊讶，这会比某些内置Azure操作更好。

将100000行（每个行包括2个Guid，2个字符串，一个时间戳和一个int数据）上载到Azure表存储的最有效方法是什么？

异步高效地向Azure表存储添加大量数据

0 个答案: