在这里遇到了一种极端情况。我的任务是将所有数据从一个数据库拉到另一个数据库,其中目标数据库具有不同的架构。
我选择编写一个WinForms实用程序来进行数据映射,并在必要时使用Entity Framework / ADO.NET进行传输。
到目前为止,该表一直很有效,但该表有250万条记录。当我不理会所有外键时,传输总共大约需要10分钟,但是,当我开始使用FirstOrDefault()
调用将已移动到目标数据库的数据的内存列表映射到外键时,实际上要增加4天的时间花费的时间。
在接下来的几天里,我将需要大量运行此工具,因此这对于我来说真的不可接受。
这是我当前的方法(不是我的第一种方法,这是为了提高效率而反复试验的结果):
private OldModelContext _oldModelContext { get; } //instantiated in controller
using (var newModelContext = new NewModelContext())
{
//Takes no time at all to load these into memory, collections are small, 3 - 20 records each
var alreadyMigratedTable1 = newModelContext.alreadyMigratedTable1.ToList();
var alreadyMigratedTable2 = newModelContext.alreadyMigratedTable2.ToList();
var alreadyMigratedTable3 = newModelContext.alreadyMigratedTable3.ToList();
var alreadyMigratedTable4 = newModelContext.alreadyMigratedTable4.ToList();
var alreadyMigratedTable5 = newModelContext.alreadyMigratedTable5.ToList();
var oldDatasetInMemory = _oldModelContext.MasterData.AsNoTracking().ToList();//2.5 Million records, takes about 6 minutes
var table = new DataTable("MasterData");
table.Columns.Add("Column1");
table.Columns.Add("Column2");
table.Columns.Add("Column3");
table.Columns.Add("ForeignKeyColumn1");
table.Columns.Add("ForeignKeyColumn2");
table.Columns.Add("ForeignKeyColumn3");
table.Columns.Add("ForeignKeyColumn4");
table.Columns.Add("ForeignKeyColumn5");
foreach(var masterData in oldDatasetInMemory){
DataRow row = table.NewRow();
//With just these properties mapped, this takes about 2 minutes for all 2.5 Million
row["Column1"] = masterData.Property1;
row["Column2"] = masterData.Property2;
row["Column3"] = masterData.Property3;
//With this mapping, we add about 4 days to the overall process.
row["ForeignKeyColumn1"] = alreadyMigratedTable1.FirstOrDefault(s => s.uniquePropertyOnNewDataset == masterData.uniquePropertyOnOldDataset);
row["ForeignKeyColumn2"] = alreadyMigratedTable2.FirstOrDefault(s => s.uniquePropertyOnNewDataset == masterData.uniquePropertyOnOldDataset);
row["ForeignKeyColumn3"] = alreadyMigratedTable3.FirstOrDefault(s => s.uniquePropertyOnNewDataset == masterData.uniquePropertyOnOldDataset);
row["ForeignKeyColumn4"] = alreadyMigratedTable4.FirstOrDefault(s => s.uniquePropertyOnNewDataset == masterData.uniquePropertyOnOldDataset);
row["ForeignKeyColumn5"] = alreadyMigratedTable5.FirstOrDefault(s => s.uniquePropertyOnNewDataset == masterData.uniquePropertyOnOldDataset);
table.Rows.Add(row);
}
//Save table with SQLBulkCopy is very fast, takes about a minute and a half.
}
}
注意:uniquePropertyOn(New/Old)Dataset
通常是数据集之间共享的唯一描述字符串,不能匹配ID,因为它们在数据库中不会相同。
我尝试过:
select
语句进行强制转换,没有太多改进。.Where(predicate).FirstOrDefault()
,没有看到任何明显的改善FirstOrDefault()
,没有发现任何改善。我一直在把foreach
变成并行的foreach循环并锁定对数据表的调用,但是我一直遇到
实体框架连接已关闭的问题
在使用并行foreach时查询内存列表时...。虽然不确定是什么意思,但最初的速度结果令人鼓舞。
如果有人认为这是一条正确的道路,我很乐意发布该代码/错误,但我不确定。.
答案 0 :(得分:3)
我要尝试的第一件事是字典,并预取各列:
var fk1 = oldDatasetInMemory.Columns["ForeignKeyColumn1"];
// ...
var alreadyMigratedTable1 = newModelContext.alreadyMigratedTable1.ToDictionary(
x => x.uniquePropertyOnNewDataset);
// ...
if (alreadyMigratedTable1.TryGetValue(masterData.uniquePropertyOnOldDataset, out var val))
row[fk1] = val;
但是,实际上:除非真的真的需要,否则我也将尽量避免整个DataTable
件。
答案 1 :(得分:2)
如果除了将所有数据加载到内存中外,实际上没有其他方法可以迁移此数据,则可以避免此嵌套循环,并通过Join
链接列表,从而提高效率。
阅读:Why is LINQ JOIN so much faster than linking with WHERE?
var newData =
from master in oldDatasetInMemory
join t1 in alreadyMigratedTable1
on master.uniquePropertyOnOldDataset equals t1.uniquePropertyOnNewDataset into t1Group
from join1 in t1Group.Take(1).DefaultIfEmpty()
join t2 in alreadyMigratedTable2
on master.uniquePropertyOnOldDataset equals t2.uniquePropertyOnNewDataset into t2Group
from join2 in t2Group.Take(1).DefaultIfEmpty()
join t3 in alreadyMigratedTable3
on master.uniquePropertyOnOldDataset equals t3.uniquePropertyOnNewDataset into t3Group
from join3 in t1Group.Take(1).DefaultIfEmpty()
join t4 in alreadyMigratedTable4
on master.uniquePropertyOnOldDataset equals t4.uniquePropertyOnNewDataset into t4Group
from join4 in t1Group.Take(1).DefaultIfEmpty()
join t5 in alreadyMigratedTable5
on master.uniquePropertyOnOldDataset equals t5.uniquePropertyOnNewDataset into t5Group
from join5 in t1Group.Take(1).DefaultIfEmpty()
select new { master, join1, join2, join3, join4, join5};
foreach (var x in newData)
{
DataRow row = table.Rows.Add();
row["Column1"] = x.master.Property1;
row["Column2"] = x.master.Property2;
row["Column3"] = x.master.Property3;
row["ForeignKeyColumn1"] = x.join1;
row["ForeignKeyColumn2"] = x.join2;
row["ForeignKeyColumn3"] = x.join3;
row["ForeignKeyColumn4"] = x.join4;
row["ForeignKeyColumn5"] = x.join5;
}
这是LINQ左外连接,仅从右侧开始排一行。