目前我有一个宏通过列表运行并删除重复值(在一列中),但事实证明它非常低效。对于检查重复项的每个条目,它必须遍历整个列;我的文件目前有50,000个条目,这不是一件小事。
我认为宏工作的一种更简单的方法是宏检查此值是否在数组中。如果是,则删除该条目所在的行。如果不是,则将该值添加到数组中。
有人可以为宏的基本大纲提供一些帮助吗?感谢
答案 0 :(得分:3)
下面的代码将遍历您的源数据并将其存储在一个数组中,同时检查重复项。收集完成后,它使用数组作为密钥来了解要删除的列。
由于删除的电极屏幕更新次数很多,请务必关闭屏幕更新。 (附带)
Sub Example()
Application.ScreenUpdating = false
Dim i As Long
Dim k As Long
Dim StorageArray() As String
Dim iLastRow As Long
iLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
ReDim StorageArray(1 To iLastRow, 0 To 1)
'loop through column from row 1 to the last row
For i = 1 To iLastRow
'add each sheet value to the first column of the array
StorageArray(i, 0) = ActiveSheet.Range("A" & i).Value
'- keep the second column as 0 by default
StorageArray(i, 1) = 0
'- as each item is added, loop through previously added items to see if its a duplicate
For k = 1 To i-1
If StorageArray(k, 0) = StorageArray(i, 0) Then
'if it is a duplicate set the second column of the srray to 1
StorageArray(i, 1) = 1
Exit For
End If
Next k
Next i
'loop through sheet backwords and delete rows that were maked for deletion
For i = iLastRow To 1 Step -1
If StorageArray(i, 1) = 1 Then
ActiveSheet.Range("A" & i).EntireRow.Delete
End If
Next i
Application.ScreenUpdating = true
End Sub
根据要求,这是一种类似的方法,使用Collections而不是Array进行键索引:(RBarryYoung)
Public Sub RemovecolumnDuplicates()
Dim prev as Boolean
prev = Application.ScreenUpdating
Application.ScreenUpdating = false
Dim i As Long, k As Long
Dim v as Variant, sv as String
Dim cl as Range, ws As Worksheet
Set ws = ActiveWorksheet 'NOTE: This really should be a parameter ...
Dim StorageArray As New Collection
Dim iLastRow As Long
iLastRow = ws.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
'loop through column from row 1 to the last row
i = 1
For k = 1 To iLastRow
'add each sheet value to the collection
Set cl = ws.Cells(i, 1)
v = cl.Value
sv = Cstr(v)
On Error Resume Next
StorageArray.Add v, sv
If Err.Number <> 0 Then
'must be a duplicate, remove it
cl.EntireRow.Delete
'Note: our index doesn't change here, since all of the rows moved
Else
'not a duplicate, so go to the next row
i = i + 1
End If
Next k
Application.ScreenUpdating = prev
End Sub
请注意,此方法不需要假定列中单元格值的任何数据类型或整数限制。
(Mea Culpa:我必须在记事本中手动输入,因为我的Excel现在忙于运行项目测试。所以可能有一些拼写/语法错误......)
答案 1 :(得分:1)
这是我的评论的后续内容。 循环50k记录 + 循环数组将成为这种简单操作的过度杀戮。
就像我在评论中提到的那样,将数组中的值复制到新工作表中。然后在50k条目旁边插入一个空白列,并执行Vlookup
或CountIf
。完成后,执行自动过滤,然后在1中删除重复的条目。让我们举一个例子来看看它是如何工作的。
假设我们有一个包含1000个项目的数组?在1张纸上我们有50k数据。以下代码将使用1000 items in Array
和50k Data
参见快照
将此代码粘贴到模块中(代码耗时少于5秒才能完成)
Sub Sample()
Dim ws As Worksheet, wstemp As Worksheet
Dim LRow As Long
Dim Ar(1 To 1000) As Long
Dim startTime As String, EndTime As String
startTime = Format(Now, "hh:mm:ss")
Set ws = Sheets("Sheet1")
Set wstemp = Sheets.Add
'~~> Creating a dummy array
For i = 1 To 1000
Ar(i) = i
Next i
'~~> Copy it to the new sheet
wstemp.Range("A1:A1000").Value = Application.Transpose(Ar)
With ws
LRow = .Range("A" & .Rows.Count).End(xlUp).Row
.Columns(2).Insert Shift:=xlToRight
.Range("B1").Value = "For Deletion"
.Range("B2:B" & LRow).FormulaR1C1 = "=COUNTIF(" & wstemp.Name & "!C[-1],RC[-1])"
.Columns(2).Value = .Columns(2).Value
'~~> Remove any filters
.AutoFilterMode = False
'~~> Filter, offset(to exclude headers) and delete visible rows
With .Range("B1:B" & LRow)
.AutoFilter Field:=1, Criteria1:="<>0"
.Offset(1, 0).SpecialCells(xlCellTypeVisible).EntireRow.Delete
End With
'~~> Remove any filters
.AutoFilterMode = False
.Columns(2).Delete
End With
EndTime = Format(Now, "hh:mm:ss")
MsgBox "The process started at " & startTime & " and finished at" & EndTime
End Sub
答案 2 :(得分:1)
对于Excel 2007及更高版本:将数组复制到工作表并使用removeduplicates方法:
set ws = worksheets.add
ws.[A1].resize(ubound(yourarray,1),ubound(yourarray,2)).value = yourarray
ws.usedrange.removeduplicates columns:=1, header:=no
这假定数组的下限为1,要删除的列是第1列,而列表中没有标题。然后,您可以找到新范围的边框并将其读回到阵列中(首先擦除当前阵列)。
答案 3 :(得分:0)
我建议填充您的列,然后使用公式查找重复项并删除它们。我没有你的实际代码(你没有给我们任何代码)
dim a as range
dim b as range
set a = Range ("A1")
Do while Not isEmpty(A)
Set b = a.offset(1,0)
If b = a then
b= ""
else a.offset (1,0)
Loop
我确信您可以将过滤器放在代码中,或者只是在运行宏之前重新填充。