大型XLS文件的最佳性能代码,并一一比较记录?

时间:2018-06-24 11:54:35

标签: c# excel winforms oledb xls

我多次阅读了该主题:This SO Link关于比较两个XLS(Excel文件),我工作并尝试一些小示例。

我想编写一个性能最佳的C#代码,该代码读取两个巨大的XLS文件,并将文件A的第一行与文件B的所有行进行比较。如果文件B的所有行中均未出现文件A的第一行,请列出A,然后转到A.xls的下一行,然后再次与文件B的所有行进行比较。

更新1:

(我做如下操作):

DataTable dt1 = GetDataTableFromExcel(this.Directory, this.FirstFile, this.FirstFileSheetName);
dtRet = getDifferentRecords(dt1, dt2);
var adapter = new OleDbDataAdapter("SELECT * FROM [" + strSheetName + "$]", connectionString);

更新2:

我的主要问题发生在Xls包含4000条记录时! (大文件)

1 个答案:

答案 0 :(得分:2)

requested by OP一样,这是VBA解决方案。猜测一些细节,因此OP将需要进行调整以适合其特定的用例

这对我来说需要4000秒的记录,运行时间不到2秒

Sub Demo()
    Dim wb1 As Workbook, wb2 As Workbook
    Dim ws1 As Worksheet, ws2 As Worksheet
    Dim r1 As Range, r2 As Range
    Dim v1 As Variant, v2 As Variant
    Dim rw1 As Long, rw2 As Long
    Dim cl As Long
    Dim Found  As Boolean

    Const NUM_COLS_COMPARE = 1 'adjust as required

    ' Get Reference to, or open workboks
    Set wb1 = Application.Workbooks("NameOfBook1.xlsx")  'if already open
    Set wb2 = Application.Workbooks.Open("C:\Path\ToWorkbook2.xlsx") 'if not open

    'Get reference to sheets
    Set ws1 = wb1.Worksheets("NameOfSheet1")
    Set ws2 = wb2.Worksheets("NameOfSheet2")

    'get reference to ranges
    '  assuming data in Column A and Row 1fill whole range.  Adjust if necassary
    Set r1 = ws1.Range(ws1.Cells(1, ws1.Columns.Count).End(xlToLeft), _
                       ws1.Cells(ws1.Rows.Count, 1).End(xlUp))
    Set r2 = ws2.Range(ws2.Cells(1, ws2.Columns.Count).End(xlToLeft), _
                       ws2.Cells(ws2.Rows.Count, 1).End(xlUp))

    'Get Data into Array
    v1 = r1.Value2
    v2 = r2.Value2

    For rw1 = 1 To UBound(v1, 1)
        For rw2 = 1 To UBound(v2, 1)
            Found = False
            For cl = 1 To NUM_COLS_COMPARE
                If v1(rw1, cl) = v2(rw2, cl) Then
                    Found = True
                    Exit For
                End If
            Next
            If Found Then Exit For
        Next rw2
        'List Found row
        If Not Found Then
            Debug.Print "No Match for " & rw1, v1(rw1, 1)
        End If
    Next rw1
End Sub