我正在尝试优化将数万个Foo
插入Azure表的操作。
当前方法如下:
public void AddBulk(IReadOnlyList<Foo> foos)
{
var parallelOptions = new ParallelOptions() { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(foos.GroupBy(x => x.QueryingId), parallelOptions, groupedFoos =>
{
var threadTable = Table;
foreach (var chunkedAmounts in groupedFoos.ToList().Chunk(100))
{
var batchOperation = new TableBatchOperation();
foreach (var amount in chunkedAmounts)
{
// Echo content off. This further reduces bandwidth usage by turning off the
// echo of the payload in the response during entity insertion.
batchOperation.Insert(new FooTableEntity(amount), false);
}
// Exponential retry policies are good for batching procedures, background tasks,
// or non-interactive scenarios. In these scenarios, you can typically allow more
// time for the service to recover—with a consequently increased chance of the
// operation eventually succeeding. Attempt delays: ~3s, ~7s, ~15s, ...
threadTable.ExecuteBatchAsync(batchOperation, new TableRequestOptions()
{
RetryPolicy = new ExponentialRetry(TimeSpan.FromMilliseconds(deltaBackoffMilliseconds), maxRetryAttempts),
MaximumExecutionTime = TimeSpan.FromSeconds(maxExecutionTimeSeconds),
}, DefaultOperationContext);
}
});
}
我已将该方法升级到.NET Core库which do not support sync over async API。因此,我正在重新评估add方法并将其转换为异步。
此方法的作者按用于分区键的ID手动将foos分组,然后将它们手动分成100个批次,然后以4x并行度上传它们。我很惊讶,这会比某些内置Azure操作更好。
将100000行(每个行包括2个Guid,2个字符串,一个时间戳和一个int数据)上载到Azure表存储的最有效方法是什么?