Excel VBA性能 - 100万行 - 在不到1分钟的时间内删除包含值的行

时间:2015-06-20 22:01:51

标签: excel performance vba excel-vba

我试图在不到一分钟的时间内找到一种过滤大数据并删除工作表中的行的方法

目标:

  • 查找第1列中包含特定文本的所有记录,并删除整行
  • 保持所有单元格格式(颜色,字体,边框,列宽)和公式

测试数据:

Test data

代码的工作原理:

  1. 首先关闭所有Excel功能
  2. 如果工作簿不为空,并且第1列中存在要删除的文本值

    • 将列1的已用范围复制到数组
    • 向后迭代数组中的每个值
    • 找到匹配项时:

      • 以“"A11,A275,A3900,..."
      • 格式将单元格地址附加到tmp字符串
      • 如果tmp变量长度接近255个字符
      • 使用.Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
      • 删除行
      • 将tmp重置为空并继续前进到下一组行
  3. 最后,它将所有Excel功能重新打开
  4. 主要问题是删除操作,总持续时间应低于一分钟。任何基于代码的解决方案都是可以接受的,只要它在1分钟内执行即可。

    这将范围缩小到极少数可接受的答案。已经提供的答案也非常简短,易于实施。 One在大约30秒内执行操作,因此至少有一个答案提供了可接受的解决方案,而其他答案可能会发现它也很有用

    我的主要初始功能:

    Sub DeleteRowsWithValuesStrings()
        Const MAX_SZ As Byte = 240
    
        Dim i As Long, j As Long, t As Double, ws As Worksheet
        Dim memArr As Variant, max As Long, tmp As String
    
        Set ws = Worksheets(1)
        max = GetMaxCell(ws.UsedRange).Row
        FastWB True:    t = Timer
    
        With ws
            If max > 1 Then
                If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then
                    memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2
                    For i = max To 1 Step -1
    
                        If memArr(i, 1) = "Test String" Then
                            tmp = tmp & "A" & i & ","
                            If Len(tmp) > MAX_SZ Then
                               .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
                               tmp = vbNullString
    
                            End If
                        End If
    
                    Next
                    If Len(tmp) > 0 Then
                        .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
                    End If
                    .Calculate
                End If
            End If
        End With
        FastWB False:   InputBox "Duration: ", "Duration", Timer - t
    End Sub
    

    辅助功能(关闭和打开Excel功能):

    Public Sub FastWB(Optional ByVal opt As Boolean = True)
        With Application
            .Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic)
            .DisplayAlerts = Not opt
            .DisplayStatusBar = Not opt
            .EnableAnimations = Not opt
            .EnableEvents = Not opt
            .ScreenUpdating = Not opt
        End With
        FastWS , opt
    End Sub
    
    Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _
                      Optional ByVal opt As Boolean = True)
        If ws Is Nothing Then
            For Each ws In Application.ActiveWorkbook.Sheets
                EnableWS ws, opt
            Next
        Else
            EnableWS ws, opt
        End If
    End Sub
    
    Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean)
        With ws
            .DisplayPageBreaks = False
            .EnableCalculation = Not opt
            .EnableFormatConditionsCalculation = Not opt
            .EnablePivotTable = Not opt
        End With
    End Sub
    

    查找包含数据的最后一个单元格(感谢@ZygD - 现在我在几种情况下对其进行了测试):

    Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range
    
        'Returns the last cell containing a value, or A1 if Worksheet is empty
    
        Const NONEMPTY As String = "*"
        Dim lRow As Range, lCol As Range
    
        If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
        If WorksheetFunction.CountA(rng) = 0 Then
            Set GetMaxCell = rng.Parent.Cells(1, 1)
        Else
            With rng
                Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                            After:=.Cells(1, 1), _
                                            SearchDirection:=xlPrevious, _
                                            SearchOrder:=xlByRows)
                If Not lRow Is Nothing Then
                    Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                                After:=.Cells(1, 1), _
                                                SearchDirection:=xlPrevious, _
                                                SearchOrder:=xlByColumns)
    
                    Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)
                End If
            End With
        End If
    End Function
    

    返回数组中匹配项的索引,如果未找到匹配项,则返回0:

    Public Function IndexOfValInRowOrCol( _
                                        ByVal searchVal As String, _
                                        Optional ByRef ws As Worksheet = Nothing, _
                                        Optional ByRef rng As Range = Nothing, _
                                        Optional ByRef vertical As Boolean = True, _
                                        Optional ByRef rowOrColNum As Long = 1 _
                                        ) As Long
    
        'Returns position in Row or Column, or 0 if no matches found
    
        Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long
    
        result = CVErr(9999) '- generate custom error
    
        Set usedRng = GetUsedRng(ws, rng)
        If Not usedRng Is Nothing Then
            If rowOrColNum < 1 Then rowOrColNum = 1
            With Application
                If vertical Then
                    result = .Match(searchVal, rng.Columns(rowOrColNum), 0)
                Else
                    result = .Match(searchVal, rng.Rows(rowOrColNum), 0)
                End If
            End With
        End If
        If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result
    End Function
    

    更新

    测试了6个解决方案(每个3个测试):到目前为止 Excel Hero's solution is the fastest (删除公式)

    以下是最快到最慢的结果:

    测试1.总共100,000条记录,10,000条要删除:

    1. ExcelHero()                    - 1.5 seconds
    
    2. DeleteRowsWithValuesNewSheet() - 2.4 seconds
    
    3. DeleteRowsWithValuesStrings()  - 2.45 minutes
    4. DeleteRowsWithValuesArray()    - 2.45 minutes
    5. QuickAndEasy()                 - 3.25 minutes
    6. DeleteRowsWithValuesUnion()    - Stopped after 5 minutes
    

    测试2.总计100万条记录,100,000条被删除:

    1. ExcelHero()                    - 16 seconds (average)
    
    2. DeleteRowsWithValuesNewSheet() - 33 seconds (average)
    
    3. DeleteRowsWithValuesStrings()  - 4 hrs 38 min (16701.375 sec)
    4. DeleteRowsWithValuesArray()    - 4 hrs 37 min (16626.3051757813 sec)
    5. QuickAndEasy()                 - 5 hrs 40 min (20434.2104492188 sec)
    6. DeleteRowsWithValuesUnion()    - N/A
    

    注意:

    1. ExcelHero方法:易于实现,可靠,速度极快,但删除公式
    2. NewSheet方法:易于实施,可靠且符合目标
    3. 字符串方法:更加努力实施,可靠,但不符合要求
    4. 数组方法:类似于字符串,但ReDims是一个数组(更快版本的Union)
    5. QuickAndEasy:易于实施(简短,可靠,优雅),但不符合要求
    6. Range Union:实现复杂度类似于2和3,但速度太慢
    7. 我还通过引入不寻常的值使测试数据更加真实:

      • 空单元格,范围,行和列
      • 特殊字符,例如= [`〜!@#$%^&amp; *()_- + {} [] \ |;:&#39;&#34;,。&lt;&gt; /?单独和多种组合
      • 空格,制表符,空公式,边框,字体和其他单元格格式
      • 带小数的大小数字(= 12.9999999999999 + 0.00000000000000001)
      • 超链接,条件格式规则
      • 内部和外部数据范围的空格式化
      • 可能导致数据问题的任何其他内容

5 个答案:

答案 0 :(得分:14)

我提供第一个答案作为参考

如果没有其他可用选项,其他人可能会发现它很有用

  • 实现结果的最快方法是不使用删除操作
  • 在100万条记录中,它删除了100,000行,平均 33秒

Sub DeleteRowsWithValuesNewSheet()  '100K records   10K to delete
                                    'Test 1:        2.40234375 sec
                                    'Test 2:        2.41796875 sec
                                    'Test 3:        2.40234375 sec
                                    '1M records     100K to delete
                                    'Test 1:        32.9140625 sec
                                    'Test 2:        33.1484375 sec
                                    'Test 3:        32.90625   sec
    Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long
    Dim wsName As String, t As Double, oldUsedRng As Range

    FastWB True:    t = Timer

    Set oldWs = Worksheets(1)
    wsName = oldWs.Name

    Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange))

    If oldUsedRng.Rows.Count > 1 Then                           'If sheet is not empty
        Set newWs = Sheets.Add(After:=oldWs)                    'Add new sheet
        With oldUsedRng
            .AutoFilter Field:=1, Criteria1:="<>Test String"
            .Copy                                               'Copy visible data
        End With
        With newWs.Cells
            .PasteSpecial xlPasteColumnWidths
            .PasteSpecial xlPasteAll                            'Paste data on new sheet
            .Cells(1, 1).Select                                 'Deselect paste area
            .Cells(1, 1).Copy                                   'Clear Clipboard
        End With
        oldWs.Delete                                            'Delete old sheet
        newWs.Name = wsName
    End If
    FastWB False:   InputBox "Duration: ", "Duration", Timer - t
End Sub

高层:

  • 它会创建一个新工作表,并保留对初始工作表的引用
  • 自动筛选搜索文本中的第1列:.AutoFilter Field:=1, Criteria1:="<>Test String"
  • 复制初始工作表中的所有(可见)数据
  • 将列宽,格式和数据粘贴到新工作表
  • 删除初始表
  • 将新工作表重命名为旧工作表名称

它使用问题

中发布的相同辅助函数

99%的持续时间由AutoFilter

使用

到目前为止,我发现了一些限制,第一个可以解决:

  1. 如果初始工作表上有任何隐藏的行,则会将其取消隐藏

    • 需要一个单独的功能来隐藏它们
    • 根据实施情况,可能会显着延长持续时间
  2. VBA相关:

    • 更改工作表的代码名称;引用Sheet1的其他VBA将被破坏(如果有的话)
    • 删除与初始工作表关联的所有VBA代码(如果有)
  3. 关于使用像这样的大文件的一些注意事项:

    • 二进制格式(.xlsb)大幅减少文件大小(从137 Mb到43 Mb)
    • 非托管条件格式规则可能会导致指数性能问题

      • 评论和数据验证相同
    • 从网络读取文件或数据比使用locall文件慢得多

答案 1 :(得分:9)

如果源数据不包含公式,或者方案允许(或希望)在条件行删除期间将公式转换为硬值,则可以实现速度的显着提高。

以上作为警告,我的解决方案使用范围对象的AdvancedFilter。它的速度大约是DeleteRowsWithValuesNewSheet()的两倍。

Public Sub ExcelHero()
    Dim t#, crit As Range, data As Range, ws As Worksheet
    Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range
    FastWB True
    t = Timer

        Set fc = ActiveSheet.UsedRange.Item(1)
        Set lc = GetMaxCell
        Set data = ActiveSheet.Range(fc, lc)
        Set ws = Sheets.Add
        With data
            Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column))
            Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column))
            With fr2
                fr1.Copy
                .PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll
                .Item(1).Select
            End With
            Set crit = .Resize(2, 1).Offset(, lc.Column + 1)
            crit = [{"Column 1";"<>Test String"}]
            .AdvancedFilter xlFilterCopy, crit, fr2
            .Worksheet.Delete
        End With

    FastWB False
    r = ws.UsedRange.Rows.Count
    Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds"
End Sub

答案 2 :(得分:5)

在我的老人戴尔Inspiron 1564(Win 7 Office 2007)上:

Sub QuickAndEasy()
    Dim rng As Range
    Set rng = Range("AA2:AA1000001")
    Range("AB1") = Now
    Application.ScreenUpdating = False
        With rng
            .Formula = "=If(A2=""Test String"",0/0,A2)"
            .Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete
            .Clear
        End With
    Application.ScreenUpdating = True
    Range("AC1") = Now
End Sub

跑了大约10秒钟。我假设列 AA 可用。

修改#1:

请注意,此代码计算设置为手动。如果在允许“帮助”列计算后将计算模式设置为手动,性能将得到改善。

答案 3 :(得分:1)

我知道我的回答非常晚,但是未来的访问者可能会发现它非常有用。

请注意:我的方法要求行的索引列以原始顺序结束,但是如果您不介意行的顺序不同,则索引列不是需要,可以删除额外的代码行。

我的方法:我的方法是简单地选择所选范围(列)中的所有行,使用Range.Sort按升序对它们进行排序,然后收集第一个和最后一个索引。所选范围(列)内的"Test String"。然后,我从第一个和最后一个索引创建一个范围,并使用Range.EntrieRow.Delete删除包含"Test String"的所有行。

优点:
- 它非常快。
- 它不会删除格式,公式,图表,图片或任何类似复制到新工作表的方法。

缺点:
- 一个相当大的代码来实现,但它是直截了当的。

测试范围生成子:

Sub DevelopTest()
    Dim index As Long
    FastWB True
    ActiveSheet.UsedRange.Clear
    For index = 1 To 1000000 '1 million test
        ActiveSheet.Cells(index, 1).Value = index
        If (index Mod 10) = 0 Then
            ActiveSheet.Cells(index, 2).Value = "Test String"
        Else
            ActiveSheet.Cells(index, 2).Value = "Blah Blah Blah"
        End If
    Next index
    Application.StatusBar = ""
    FastWB False
End Sub

过滤并删除子行:

Sub DeleteRowFast()
    Dim curWorksheet As Worksheet 'Current worksheet vairable

    Dim rangeSelection As Range   'Selected range
    Dim startBadVals As Long      'Start of the unwanted values
    Dim endBadVals As Long        'End of the unwanted values
    Dim strtTime As Double        'Timer variable
    Dim lastRow As Long           'Last Row variable
    Dim lastColumn As Long        'Last column variable
    Dim indexCell As Range        'Index range start
    Dim sortRange As Range        'The range which the sort is applied to
    Dim currRow As Range          'Current Row index for the for loop
    Dim cell As Range             'Current cell for use in the for loop

    On Error GoTo Err
        Set rangeSelection = Application.InputBox("Select the (N=) range to be checked", "Get Range", Type:=8)    'Get the desired range from the user
        Err.Clear

    M1 = MsgBox("This is recommended for large files (50,000 or more entries)", vbYesNo, "Enable Fast Workbook?") 'Prompt the user with an option to enable Fast Workbook, roughly 150% performace gains... Recommended for incredibly large files
    Select Case M1
        Case vbYes
            FastWB True  'Enable fast workbook
        Case vbNo
            FastWB False 'Disable fast workbook
    End Select

    strtTime = Timer     'Begin the timer

    Set curWorksheet = ActiveSheet
    lastRow = CLng(rangeSelection.SpecialCells(xlCellTypeLastCell).Row)
    lastColumn = curWorksheet.Cells(1, 16384).End(xlToLeft).Column

    Set indexCell = curWorksheet.Cells(1, 1)

    On Error Resume Next

    If rangeSelection.Rows.Count > 1 Then 'Check if there is anything to do

        lastVisRow = rangeSelection.Rows.Count

        Set sortRange = curWorksheet.Range(indexCell, curWorksheet.Cells(curWorksheet.Rows(lastRow).Row, 16384).End(xlToLeft)) 'Set the sort range

        sortRange.Sort Key1:=rangeSelection.Cells(1, 1), Order1:=xlAscending, Header:=xlNo 'Sort by values, lowest to highest

        startBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, MatchCase:=False).Row
        endBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, SearchDirection:=xlPrevious, MatchCase:=False).Row

        curWorksheet.Range(curWorksheet.Rows(startBadVals), curWorksheet.Rows(endBadVals)).EntireRow.Delete 'Delete uneeded rows, deleteing in continuous range blocks is quick than seperated or individual deletions.

        sortRange.Sort Key1:=indexCell, Order1:=xlAscending, Header:=xlNo 'Sort by index instead of values, lowest to highest
    End If

    Application.StatusBar = ""                    'Reset the status bar

    FastWB False                                  'Disable fast workbook

    MsgBox CStr(Round(Timer - strtTime, 2)) & "s" 'Display duration of task

Err:
    Exit Sub

End Sub

此代码使用FastWBFastWSEnableWS作者Paul Bica!

100K条目的时间(10k要删除,FastWB True):
1. 0.2秒
2. 0.2秒。
3. 0.21秒。
平均。 0.2秒

100万条目的时间(100k要删除,FastWB True):
1. 2.3秒。
2. 2.32秒。
3. 2.3秒。
平均。 2.31秒。

继续运行:Windows 10,iMac i3 11,2(自2010年起)

编辑
此代码最初的目的是过滤掉数值范围之外的数值,并且已经过调整以过滤掉"Test String",因此某些代码可能是多余的。

答案 4 :(得分:0)

在计算使用的范围和行数时使用数组可能会影响性能。这是另一种方法,在测试中证明在1m +行数据中有效 - 在25-30秒之间。它不使用过滤器,因此即使隐藏也会删除行。删除整行不会影响其他剩余行的格式或列宽。

  1. 首先,检查ActiveSheet是否具有“测试字符串”。由于您只对第1列感兴趣,我使用了这个:

    TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String")
    If TCount > 0 Then
    
  2. 我只使用Cells.SpecialCells(xlCellTypeLastCell).Row来获取最后一行,而不是使用GetMaxCell()函数:

    EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row
    
  3. 然后循环遍历数据行:

    While r <= EndRow
    
  4. 测试第1列中的单元格是否等于“Test String”:

    If sht.Cells(r, 1).Text) = "Test String" Then
    
  5. 删除行:

    Rows(r).Delete Shift:=xlUp
    
  6. 将以下全部代码全部放在一起。我已将ActiveSheet设置为变量Sht并添加了ScreenUpdating,以提高效率。由于它是大量数据,我确保最后清除变量。

    Sub RowDeleter()
        Dim sht As Worksheet
        Dim r As Long
        Dim EndRow As Long
        Dim TCount As Long
        Dim s As Date
        Dim e As Date
    
        Application.ScreenUpdating = True
        r = 2       'Initialise row number
        s = Now     'Start Time
        Set sht = ActiveSheet
        EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row
    
        'Check if "Test String" is found in Column 1
        TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String")
        If TCount > 0 Then
    
            'loop through to the End row
            While r <= EndRow
                If InStr(sht.Cells(r, 1).Text, "Test String") > 0 Then
                    sht.Rows(r).Delete Shift:=xlUp
                    r = r - 1
                End If
                r = r + 1
            Wend
        End If
        e = Now  'End Time
        D = (Hour(e) * 360 + Minute(e) * 60 + Second(e)) - (Hour(s) * 360 + Minute(s) * 60 + Second(s))
        Application.ScreenUpdating = True
        DurationTime = TimeSerial(0, 0, D)
        MsgBox Format(DurationTime, "hh:mm:ss")
    End Sub