Question

我有一个非常大的DataTable~400万行。

我需要计算表中的列，如果我在方法（Go1）中处理整个列，它比Go2更快，我遍历行并调用每行的方法。

我需要使用Go2方法，因为稍后我需要向表中添加更多行并更新所有列。

但为什么Go2接近慢 - 它是否只是每次调用ProcessRow（）的开销？

有解决方法吗？

public static void AddSignal()
{
    foreach (DataRow row in Data.Rows)
    {
        row[x] = (invertSignal ? -1:1)*Math.Sign(row.Field<double>(y) - row.Field<double>(y));
    }
}

public class ByRowAddSignal
{
    DataRow row;

    public ByRowAddSignal()
    {

    }

    public void ProcessRow(int r)
    {
        row = Data.Rows[r];
        row[x] = (invertSignal ? -1 : 1) * Math.Sign(row.Field<double>(y) - row.Field<double>(y));
    }
}

Public static DataTable Data;
public void Go1()
{
      Data = LoadData();

      AddSignal();
}

public void Go2()
{
      Data = LoadData();

      ByRowAddSignal byRowAddSignal = new ByRowAddSignal ();

      for (int r = 0; r < Data.Rows.Count; r++)
      {
            byRowAddSignal.ProcessRow(r);
      }
}

Answer 1

查看<User>的代码，我们发现以下内容：

DataRowCollection

并且public DataRow this[int index] { get { return ((RBTree<DataRow>)this.list)[index]; } }实际上是一个树而不是数组支持列表，因此索引到它是很复杂的，因为在每个索引调用中你需要迭代到aproproate元素。来自RBTree<K>的代码显示了这一点：

RBTree<K>

注意使用ILSpy反编译的代码

为什么在循环访问DataTable列时第二种方法会变慢

1 个答案: