Question

我想分批枚举数据表。为此，我创建了一个返回IEnumerable<DataTable>方法的方法，该方法如下所示：

public IEnumerable<DataTable> EnumerateRowsInBatches( DataTable table, int batchSize ) {

    int rowCount = table.Rows.Count;
    int batchIndex = 0;
    while( batchIndex * batchSize < rowCount ) {
        DataTable result = table.Clone();
        int batchStart = batchIndex * batchSize;
        int batchLimit = ( batchIndex + 1 ) * batchSize;
        if( rowCount < batchLimit )
            batchLimit = rowCount;
        for( int i = batchStart; i < batchLimit; i++ ) {
            result.ImportRow( table.Rows[ i ] );
        }
        batchIndex++;
        yield return result;
    }
}

实际上这很好用。我正在迭代这些批处理，以便使用表值参数发送到SQL Server。但我发现ImportRow占用了大部分时间，我想加快速度。

我正在寻找如何做到这一点。我可以自由地将所有数据视为只读，因此我觉得在这里不一定非常需要复制行。

Answer 1

我采用了一种方法，可以在我的测试中提高约40％的性能：

public static IEnumerable<DataTable> EnumerateRowsInBatches(DataTable table,
                                                            int batchSize)
{
    int rowCount = table.Rows.Count;
    int batchIndex = 0;
    DataTable result = table.Clone(); // This will not change, avoid recreate it
    while (batchIndex * batchSize < rowCount)
    {
        result.Rows.Clear(); // Reuse that DataTable, clear previous results
        int batchStart = batchIndex * batchSize;
        int batchLimit = (batchIndex + 1) * batchSize;
        if (rowCount < batchLimit)
            batchLimit = rowCount;

        for (int i = batchStart; i < batchLimit; i++)
            result.Rows.Add(table.Rows[i].ItemArray); // Avoid ImportRow

        batchIndex++;
        yield return result;
    }
}

有效地批量枚举数据表行

1 个答案: