我有一个包含2.5M行的数据表。我想过滤数据表中的一些行。
数据表的列:
[IntCode] long
[BDIntCode] long
[TxnDT] DateTime
[TxnQuantity] decimal
[RecordUser] long
[RecordDT] DateTime
我的代码如下:
foreach (var down in breakDowns)
{
sw.Start();
var relatedBreakDowns = firstGroup.Where(x => x.RelatedBDIntCode == down.ProcessingRowIntCode).ToList();
if (relatedBreakDowns.Count == 0) continue;
var filters = string.Format("BDIntCode IN ({0})", string.Join(",", relatedBreakDowns.Select(x => x.BDIntCode)));
var filteredDatatable = datatable.Select(filters, "BDIntCode");
foreach (var dataRow in filteredDatatable)
{
var r = dataTableSchema.NewRow();
r["RecordUser"] = recordUser;
r["RecordDT"] = DateTime.Now;
r["TxnQuantity"] = dataRow["TxnQuantity"];
r["TxnDT"] = dataRow["TxnDT"];
r["BDIntCode"] = down.ProcessingRowIntCode;
dataTableSchema.Rows.Add(r);
}
sw.Stop();
count++;
Console.WriteLine("Group: " + unrelatedBreakDownGroup.RelatedBDGroupIntCode + ", Count : " + count + ", ElapsedTime : ms = " + sw.ElapsedMilliseconds + ", sec = " + sw.ElapsedMilliseconds / 1000f );
sw.Reset();
}
breakDowns列表的计数是1805,firstGroup列表的计数是9880.
答案 0 :(得分:0)
就个人而言,我会先从List<SomeType>
开始,而不是数据表。然后我会将数据编入索引:在您的情况下,您正在按RelatedBDIntCode
搜索并期待多个匹配,因此:
var index = firstGroup.ToLookup(x => x.RelatedBDIntCode);
foreach (var down in breakDowns) {
var matches = index[down.ProcessingRowIntCode].ToList();
//...
}
这可以避免对firstGroup
中的每个项目进行breakDowns
的完整扫描。
下一个IN
可以移到类似索引的搜索中,这次大概是BDIntCode
。
答案 1 :(得分:0)
只是详细说明Marc的答案 - 您应该尝试减少代码执行的迭代次数。
您的代码当前编写的方式,您正在遍历故障集合1805次,然后对于每个迭代,您在第一组集合上迭代9880次,因此总计17833400次迭代而不考虑数据表过滤器
因此,您的方法应该是预先尝试索引数据,以减少执行的迭代次数。
因此,第一步可以创建RelatedBDIntCode
到datatable
的正确行的索引映射到字典中。然后你可以遍历breakDowns
并为每个down
拉出映射的行,如下所示:
var dtIndexed =
firstGroup
.GroupBy(x => x.RelatedBDIntCode)
.ToDictionary
(
x => x.Key, //the RelatedBDIntCode you'll be selecting with
x => //the mapped rows. This is the same method of filtering, but you could try others
{
var filters = string.Format("BDIntCode IN ({0})", string.Join(",", x.Select(y => y.BDIntCode)));
return datatable.Select(filters, "BDIntCode");
}
);
foreach (var down in breakDowns)
{
if(!dtIndexed.ContainsKey(down.ProcessingRowIntCode)) continue;
var rows = dtIndexed[down.ProcessingRowIntCode];
foreach (var row in rows)
{
var r = dataTableSchema.NewRow();
r["RecordUser"] = recordUser;
r["RecordDT"] = DateTime.Now;
r["TxnQuantity"] = row["TxnQuantity"];
r["TxnDT"] = row["TxnDT"];
r["BDIntCode"] = down.ProcessingRowIntCode;
dataTableSchema.Rows.Add(r);
}
}
这种方法应该减少代码执行的迭代次数,从而提高性能。
请注意,在上面的代码中,我使用了与数据表执行过滤完全相同的方法 - 即datatable.Select(filter, order)
。您可以尝试尝试使用datatable.AsEnumerable().Where(row => ...)