Question

我有一个带有Row.Count = 2.000.000的DataTable和两个包含整数值的列。

所以我需要的是有效地在循环中过滤数据表。

我正在做这件事;

for (int i= 0; i< HugeDataTable.Rows.Count; i++)
{
  tempIp= int.Parse(HugeDataTable.Rows[i]["col1"].ToString());

  var filteredUsers = tumu.Select("col1= " + tempIp.ToString()).Select(dr => dr.Field<int>("col2")).ToList(); 

HashSet<int> filtered = new HashSet<int>(filteredUsersByJob2);

  Boolean[] userVector2 = userVectorBase
      .Select(item => filtered.Contains(item))
      .ToArray();

  ...
}

我应该怎样做才能提高效果。我需要每一个小技巧。数据表索引，linq搜索是我想出的谷歌搜索。我想听听你的建议。谢谢。

Answer 1

您可以使用Parallel.For

Parallel.For(0, table.Rows.Count, rowIndex => {
var row = table.Rows[rowIndex];
// put your per-row calculation here});

Please have a look at this post

Answer 2

你正在使用双循环。如果你的肿瘤包含很多行，那将非常慢。

修复：在for循环之前为所有用户创建一个字典。在for循环中检查字典。

这样的事情：

Dictionary<string, id> usersByCode;//Init + fill it in
for (int i= 0; i< HugeDataTable.Rows.Count; i++)
{
  tempIp= int.Parse(HugeDataTable.Rows[i]["col1"].ToString());
  if(usersByCode.Contains(tempId) 
  {
    //Do something
  }
}

在for循环中过滤大型DataTable

2 个答案: