我试图在不到一分钟的时间内找到一种过滤大数据并删除工作表中的行的方法
目标:
测试数据:
:
代码的工作原理:
如果工作簿不为空,并且第1列中存在要删除的文本值
找到匹配项时:
"A11,A275,A3900,..."
.Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
主要问题是删除操作,总持续时间应低于一分钟。任何基于代码的解决方案都是可以接受的,只要它在1分钟内执行即可。
这将范围缩小到极少数可接受的答案。已经提供的答案也非常简短,易于实施。 One在大约30秒内执行操作,因此至少有一个答案提供了可接受的解决方案,而其他答案可能会发现它也很有用
我的主要初始功能:
Sub DeleteRowsWithValuesStrings()
Const MAX_SZ As Byte = 240
Dim i As Long, j As Long, t As Double, ws As Worksheet
Dim memArr As Variant, max As Long, tmp As String
Set ws = Worksheets(1)
max = GetMaxCell(ws.UsedRange).Row
FastWB True: t = Timer
With ws
If max > 1 Then
If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then
memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2
For i = max To 1 Step -1
If memArr(i, 1) = "Test String" Then
tmp = tmp & "A" & i & ","
If Len(tmp) > MAX_SZ Then
.Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
tmp = vbNullString
End If
End If
Next
If Len(tmp) > 0 Then
.Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
End If
.Calculate
End If
End If
End With
FastWB False: InputBox "Duration: ", "Duration", Timer - t
End Sub
辅助功能(关闭和打开Excel功能):
Public Sub FastWB(Optional ByVal opt As Boolean = True)
With Application
.Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic)
.DisplayAlerts = Not opt
.DisplayStatusBar = Not opt
.EnableAnimations = Not opt
.EnableEvents = Not opt
.ScreenUpdating = Not opt
End With
FastWS , opt
End Sub
Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _
Optional ByVal opt As Boolean = True)
If ws Is Nothing Then
For Each ws In Application.ActiveWorkbook.Sheets
EnableWS ws, opt
Next
Else
EnableWS ws, opt
End If
End Sub
Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean)
With ws
.DisplayPageBreaks = False
.EnableCalculation = Not opt
.EnableFormatConditionsCalculation = Not opt
.EnablePivotTable = Not opt
End With
End Sub
查找包含数据的最后一个单元格(感谢@ZygD - 现在我在几种情况下对其进行了测试):
Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range
'Returns the last cell containing a value, or A1 if Worksheet is empty
Const NONEMPTY As String = "*"
Dim lRow As Range, lCol As Range
If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
If WorksheetFunction.CountA(rng) = 0 Then
Set GetMaxCell = rng.Parent.Cells(1, 1)
Else
With rng
Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByRows)
If Not lRow Is Nothing Then
Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
After:=.Cells(1, 1), _
SearchDirection:=xlPrevious, _
SearchOrder:=xlByColumns)
Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)
End If
End With
End If
End Function
返回数组中匹配项的索引,如果未找到匹配项,则返回0:
Public Function IndexOfValInRowOrCol( _
ByVal searchVal As String, _
Optional ByRef ws As Worksheet = Nothing, _
Optional ByRef rng As Range = Nothing, _
Optional ByRef vertical As Boolean = True, _
Optional ByRef rowOrColNum As Long = 1 _
) As Long
'Returns position in Row or Column, or 0 if no matches found
Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long
result = CVErr(9999) '- generate custom error
Set usedRng = GetUsedRng(ws, rng)
If Not usedRng Is Nothing Then
If rowOrColNum < 1 Then rowOrColNum = 1
With Application
If vertical Then
result = .Match(searchVal, rng.Columns(rowOrColNum), 0)
Else
result = .Match(searchVal, rng.Rows(rowOrColNum), 0)
End If
End With
End If
If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result
End Function
更新
测试了6个解决方案(每个3个测试):到目前为止 Excel Hero's solution is the fastest (删除公式)
以下是最快到最慢的结果:
测试1.总共100,000条记录,10,000条要删除:
1. ExcelHero() - 1.5 seconds
2. DeleteRowsWithValuesNewSheet() - 2.4 seconds
3. DeleteRowsWithValuesStrings() - 2.45 minutes
4. DeleteRowsWithValuesArray() - 2.45 minutes
5. QuickAndEasy() - 3.25 minutes
6. DeleteRowsWithValuesUnion() - Stopped after 5 minutes
测试2.总计100万条记录,100,000条被删除:
1. ExcelHero() - 16 seconds (average)
2. DeleteRowsWithValuesNewSheet() - 33 seconds (average)
3. DeleteRowsWithValuesStrings() - 4 hrs 38 min (16701.375 sec)
4. DeleteRowsWithValuesArray() - 4 hrs 37 min (16626.3051757813 sec)
5. QuickAndEasy() - 5 hrs 40 min (20434.2104492188 sec)
6. DeleteRowsWithValuesUnion() - N/A
注意:
我还通过引入不寻常的值使测试数据更加真实:
答案 0 :(得分:14)
我提供第一个答案作为参考
如果没有其他可用选项,其他人可能会发现它很有用
Sub DeleteRowsWithValuesNewSheet() '100K records 10K to delete
'Test 1: 2.40234375 sec
'Test 2: 2.41796875 sec
'Test 3: 2.40234375 sec
'1M records 100K to delete
'Test 1: 32.9140625 sec
'Test 2: 33.1484375 sec
'Test 3: 32.90625 sec
Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long
Dim wsName As String, t As Double, oldUsedRng As Range
FastWB True: t = Timer
Set oldWs = Worksheets(1)
wsName = oldWs.Name
Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange))
If oldUsedRng.Rows.Count > 1 Then 'If sheet is not empty
Set newWs = Sheets.Add(After:=oldWs) 'Add new sheet
With oldUsedRng
.AutoFilter Field:=1, Criteria1:="<>Test String"
.Copy 'Copy visible data
End With
With newWs.Cells
.PasteSpecial xlPasteColumnWidths
.PasteSpecial xlPasteAll 'Paste data on new sheet
.Cells(1, 1).Select 'Deselect paste area
.Cells(1, 1).Copy 'Clear Clipboard
End With
oldWs.Delete 'Delete old sheet
newWs.Name = wsName
End If
FastWB False: InputBox "Duration: ", "Duration", Timer - t
End Sub
高层:
.AutoFilter Field:=1, Criteria1:="<>Test String"
它使用问题
中发布的相同辅助函数99%的持续时间由AutoFilter
使用
到目前为止,我发现了一些限制,第一个可以解决:
如果初始工作表上有任何隐藏的行,则会将其取消隐藏
VBA相关:
关于使用像这样的大文件的一些注意事项:
非托管条件格式规则可能会导致指数性能问题
从网络读取文件或数据比使用locall文件慢得多
答案 1 :(得分:9)
如果源数据不包含公式,或者方案允许(或希望)在条件行删除期间将公式转换为硬值,则可以实现速度的显着提高。
以上作为警告,我的解决方案使用范围对象的AdvancedFilter。它的速度大约是DeleteRowsWithValuesNewSheet()的两倍。
Public Sub ExcelHero()
Dim t#, crit As Range, data As Range, ws As Worksheet
Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range
FastWB True
t = Timer
Set fc = ActiveSheet.UsedRange.Item(1)
Set lc = GetMaxCell
Set data = ActiveSheet.Range(fc, lc)
Set ws = Sheets.Add
With data
Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column))
Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column))
With fr2
fr1.Copy
.PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll
.Item(1).Select
End With
Set crit = .Resize(2, 1).Offset(, lc.Column + 1)
crit = [{"Column 1";"<>Test String"}]
.AdvancedFilter xlFilterCopy, crit, fr2
.Worksheet.Delete
End With
FastWB False
r = ws.UsedRange.Rows.Count
Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds"
End Sub
答案 2 :(得分:5)
在我的老人戴尔Inspiron 1564(Win 7 Office 2007)上:
Sub QuickAndEasy()
Dim rng As Range
Set rng = Range("AA2:AA1000001")
Range("AB1") = Now
Application.ScreenUpdating = False
With rng
.Formula = "=If(A2=""Test String"",0/0,A2)"
.Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete
.Clear
End With
Application.ScreenUpdating = True
Range("AC1") = Now
End Sub
跑了大约10秒钟。我假设列 AA 可用。
修改#1:强>
请注意,此代码不将计算设置为手动。如果在允许“帮助”列计算后将计算模式设置为手动,性能将得到改善。
答案 3 :(得分:1)
我知道我的回答非常晚,但是未来的访问者可能会发现它非常有用。
请注意:我的方法要求行的索引列以原始顺序结束,但是如果您不介意行的顺序不同,则索引列不是需要,可以删除额外的代码行。
我的方法:我的方法是简单地选择所选范围(列)中的所有行,使用Range.Sort
按升序对它们进行排序,然后收集第一个和最后一个索引。所选范围(列)内的"Test String"
。然后,我从第一个和最后一个索引创建一个范围,并使用Range.EntrieRow.Delete
删除包含"Test String"
的所有行。
优点:
- 它非常快。
- 它不会删除格式,公式,图表,图片或任何类似复制到新工作表的方法。
缺点:
- 一个相当大的代码来实现,但它是直截了当的。
测试范围生成子:
Sub DevelopTest()
Dim index As Long
FastWB True
ActiveSheet.UsedRange.Clear
For index = 1 To 1000000 '1 million test
ActiveSheet.Cells(index, 1).Value = index
If (index Mod 10) = 0 Then
ActiveSheet.Cells(index, 2).Value = "Test String"
Else
ActiveSheet.Cells(index, 2).Value = "Blah Blah Blah"
End If
Next index
Application.StatusBar = ""
FastWB False
End Sub
过滤并删除子行:
Sub DeleteRowFast()
Dim curWorksheet As Worksheet 'Current worksheet vairable
Dim rangeSelection As Range 'Selected range
Dim startBadVals As Long 'Start of the unwanted values
Dim endBadVals As Long 'End of the unwanted values
Dim strtTime As Double 'Timer variable
Dim lastRow As Long 'Last Row variable
Dim lastColumn As Long 'Last column variable
Dim indexCell As Range 'Index range start
Dim sortRange As Range 'The range which the sort is applied to
Dim currRow As Range 'Current Row index for the for loop
Dim cell As Range 'Current cell for use in the for loop
On Error GoTo Err
Set rangeSelection = Application.InputBox("Select the (N=) range to be checked", "Get Range", Type:=8) 'Get the desired range from the user
Err.Clear
M1 = MsgBox("This is recommended for large files (50,000 or more entries)", vbYesNo, "Enable Fast Workbook?") 'Prompt the user with an option to enable Fast Workbook, roughly 150% performace gains... Recommended for incredibly large files
Select Case M1
Case vbYes
FastWB True 'Enable fast workbook
Case vbNo
FastWB False 'Disable fast workbook
End Select
strtTime = Timer 'Begin the timer
Set curWorksheet = ActiveSheet
lastRow = CLng(rangeSelection.SpecialCells(xlCellTypeLastCell).Row)
lastColumn = curWorksheet.Cells(1, 16384).End(xlToLeft).Column
Set indexCell = curWorksheet.Cells(1, 1)
On Error Resume Next
If rangeSelection.Rows.Count > 1 Then 'Check if there is anything to do
lastVisRow = rangeSelection.Rows.Count
Set sortRange = curWorksheet.Range(indexCell, curWorksheet.Cells(curWorksheet.Rows(lastRow).Row, 16384).End(xlToLeft)) 'Set the sort range
sortRange.Sort Key1:=rangeSelection.Cells(1, 1), Order1:=xlAscending, Header:=xlNo 'Sort by values, lowest to highest
startBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, MatchCase:=False).Row
endBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, SearchDirection:=xlPrevious, MatchCase:=False).Row
curWorksheet.Range(curWorksheet.Rows(startBadVals), curWorksheet.Rows(endBadVals)).EntireRow.Delete 'Delete uneeded rows, deleteing in continuous range blocks is quick than seperated or individual deletions.
sortRange.Sort Key1:=indexCell, Order1:=xlAscending, Header:=xlNo 'Sort by index instead of values, lowest to highest
End If
Application.StatusBar = "" 'Reset the status bar
FastWB False 'Disable fast workbook
MsgBox CStr(Round(Timer - strtTime, 2)) & "s" 'Display duration of task
Err:
Exit Sub
End Sub
此代码使用FastWB
,FastWS
和EnableWS
作者Paul Bica!
100K条目的时间(10k要删除,FastWB True):
1. 0.2秒
2. 0.2秒。
3. 0.21秒。
平均。 0.2秒
100万条目的时间(100k要删除,FastWB True):
1. 2.3秒。
2. 2.32秒。
3. 2.3秒。
平均。 2.31秒。
继续运行:Windows 10,iMac i3 11,2(自2010年起)
编辑
此代码最初的目的是过滤掉数值范围之外的数值,并且已经过调整以过滤掉"Test String"
,因此某些代码可能是多余的。
答案 4 :(得分:0)
在计算使用的范围和行数时使用数组可能会影响性能。这是另一种方法,在测试中证明在1m +行数据中有效 - 在25-30秒之间。它不使用过滤器,因此即使隐藏也会删除行。删除整行不会影响其他剩余行的格式或列宽。
首先,检查ActiveSheet是否具有“测试字符串”。由于您只对第1列感兴趣,我使用了这个:
TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String")
If TCount > 0 Then
我只使用Cells.SpecialCells(xlCellTypeLastCell).Row
来获取最后一行,而不是使用GetMaxCell()函数:
EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row
然后循环遍历数据行:
While r <= EndRow
测试第1列中的单元格是否等于“Test String”:
If sht.Cells(r, 1).Text) = "Test String" Then
删除行:
Rows(r).Delete Shift:=xlUp
将以下全部代码全部放在一起。我已将ActiveSheet设置为变量Sht并添加了ScreenUpdating,以提高效率。由于它是大量数据,我确保最后清除变量。
Sub RowDeleter()
Dim sht As Worksheet
Dim r As Long
Dim EndRow As Long
Dim TCount As Long
Dim s As Date
Dim e As Date
Application.ScreenUpdating = True
r = 2 'Initialise row number
s = Now 'Start Time
Set sht = ActiveSheet
EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row
'Check if "Test String" is found in Column 1
TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String")
If TCount > 0 Then
'loop through to the End row
While r <= EndRow
If InStr(sht.Cells(r, 1).Text, "Test String") > 0 Then
sht.Rows(r).Delete Shift:=xlUp
r = r - 1
End If
r = r + 1
Wend
End If
e = Now 'End Time
D = (Hour(e) * 360 + Minute(e) * 60 + Second(e)) - (Hour(s) * 360 + Minute(s) * 60 + Second(s))
Application.ScreenUpdating = True
DurationTime = TimeSerial(0, 0, D)
MsgBox Format(DurationTime, "hh:mm:ss")
End Sub