给定一个更新对象的实体List,在Parallel.For或foreach循环中每次迭代实例化一个新上下文是安全的,并且每次(比方说)10 000次迭代调用SubmitChanges()?
以这种方式执行批量更新是否安全?有什么可能的缺点?
答案 0 :(得分:7)
这可能是应该避免并行性的场景。 每次迭代实例化一个新的DataContext意味着在迭代中,将从连接池中获取连接,打开并在将连接返回到池之前将单个实体写入数据库。这样做每次迭代都是一个相对昂贵的操作,因此产生的开销超过了并行性的优势。将实体添加到数据上下文并将它们作为单个操作写入数据库的情况更为有效。
使用以下作为并行插入的基准:
private static TimeSpan RunInParallel(int inserts)
{
Stopwatch watch = new Stopwatch();
watch.Start();
Parallel.For(0, inserts, new ParallelOptions() { MaxDegreeOfParallelism = 100 },
(i) =>
{
using (var context = new DataClasses1DataContext())
{
context.Tables.InsertOnSubmit(new Table() { Number = i });
context.SubmitChanges();
}
}
);
watch.Stop();
return watch.Elapsed;
}
对于连续插入:
private static TimeSpan RunInSerial(int inserts)
{
Stopwatch watch = new Stopwatch();
watch.Start();
using (var ctx = new DataClasses1DataContext())
{
for (int i = 0; i < inserts; i++)
{
ctx.Tables.InsertOnSubmit(new Table() { Number = i });
}
ctx.SubmitChanges();
}
watch.Stop();
return watch.Elapsed;
}
DataClasses1DataContext
类是自动生成的DataContext
:
在第一代Intel i7(8个逻辑核心)上运行时,获得了以下结果:
10 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:00.0202820
Average time elapsed for a 100 runs in serial: 00:00:00.0108694
100 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:00.2269799
Average time elapsed for a 100 runs in serial: 00:00:00.1434693
1000 inserts:
Average time elapsed for a 100 runs in parallel: 00:00:02.1647577
Average time elapsed for a 100 runs in serial: 00:00:00.8163786
10000 inserts:
Average time elapsed for a 10 runs in parallel: 00:00:22.7436584
Average time elapsed for a 10 runs in serial: 00:00:07.7273398
通常,当并行运行时,插入所执行的时间大约是没有并行运行时的两倍。
<强>更新强> 如果您可以为数据实现一些批处理方案,那么使用并行插入可能是有益的。
使用批次时,批次的大小会影响插入性能,因此必须确定每批次的条目数和插入的批次数之间的最佳比率。为了证明这一点,使用以下方法将10000个插入批量分组为1个(10000个批次,与初始并行方法相同),10个(1000个批次),100个(100个批次),1000个(10个批次),10000个(1个批次) ,与串行插入方法相同)然后并行插入每个批次:
private static TimeSpan RunAsParallelBatches(int inserts, int batchSize)
{
Stopwatch watch = new Stopwatch();
watch.Start();
// batch the data to be inserted
List<List<int>> batches = new List<List<int>>();
for (int g = 0; g < inserts / batchSize; g++)
{
List<int> numbers = new List<int>();
int start = g * batchSize;
int end = start + batchSize;
for (int i = start; i < end; i++)
{
numbers.Add(i);
}
batches.Add(numbers);
}
// insert each batch in parallel
Parallel.ForEach(batches,
(batch) =>
{
using (DataClasses1DataContext ctx = new DataClasses1DataContext())
{
foreach (int number in batch)
{
ctx.Tables.InsertOnSubmit(new Table() { Number = number });
}
ctx.SubmitChanges();
}
}
);
watch.Stop();
return watch.Elapsed;
}
获取10次10000次插入的平均时间会产生以下结果:
10000 inserts repeated 10 times
Average time for initial parallel insertion approach: 00:00:22.7436584
Average time in parallel using batches of 1 entity (10000 batches): 00:00:23.1088289
Average time in parallel using batches of 10 entities (1000 batches): 00:00:07.1443220
Average time in parallel using batches of 100 entities (100 batches): 00:00:04.3111268
Average time in parallel using batches of 1000 entities (10 batches): 00:00:04.0668334
Average time in parallel using batches of 10000 entities (1 batch): 00:00:08.2820498
Average time for serial insertion approach: 00:00:07.7273398
因此,通过将插入分组到组中,只要在迭代中执行了足够的工作以超过设置DataContext和执行批量插入的开销,就可以获得性能提升。在这种情况下,通过将插入分组成1000个组,并行插入设法在该系统上执行串行~2倍。
答案 1 :(得分:1)
这可以安全地完成,并将产生更好的性能。你需要确保: