目前使用此代码,我有一大堆数据,而且运行速度非常慢。我需要删除任何重复的信息,并保留最高的信息行。
dim dup as variant, r as long, lncheckduplicatescolumn as long
With wb_DST.Sheets(sWs_DST)
lncheckduplicatescolumn = .Cells(.Rows.Count, "A").End(xlUp).row
for r = lncheckduplicatescolumn to 2 step -1
dup = application.match(.cells(r, "A").value, .columns(1), 0)
if dup < r then .rows(dup).delete
next r
end with
数据:
Column A Column B
A 1
B 2
C 3
A 3
结果应该是:
B 2
C 3
A 3
A列中的数据顺序无关紧要,只要它是唯一的,并保留较高行号中的信息。虽然我共享的代码有效,但对于大型数据集来说它太慢了。
答案 0 :(得分:3)
另一种快速方法是使用Dictionary
对象。您可以检查列{A}中的任何值是否已存在于Dictionary
中。如果他们这样做(意味着它是重复的),那么不要每次都删除它们,这为代码的运行时间增加了很长时间。相反,您可以使用DelRng
对象,Range
使用Union
来合并多个重复的行。
稍后,您可以使用DelRng.Delete
一次删除整个ducplicates范围。
<强> 代码 强>
Option Explicit
Sub RemoveDuplicatesUsingDict()
Dim wb_DST As Workbook
Dim sWs_DST As String
' Dictionary variables
Dim Dict As Object
Dim DictIndex As Long, ExistIndex As Long
Dim DelRng As Range
Dim LastRow As Long, i As Long
' --- parameters for my internal testing ---
Set wb_DST = ThisWorkbook
sWs_DST = "Sheet1"
Application.ScreenUpdating = False
Set Dict = CreateObject("Scripting.Dictionary")
With wb_DST.Sheets(sWs_DST)
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row ' get last row with data in column "A"
For i = LastRow To 2 Step -1
If Not Dict.exists(.Range("A" & i).Value) Then ' value doesn't exists yet in Dictionary >> add this Key
Dict.Add .Range("A" & i).Value, .Range("A" & i).Value
Else ' value already exists in Dictionary >> add it to DelRng (later will delete the entire range)
If Not DelRng Is Nothing Then
Set DelRng = Application.Union(DelRng, .Rows(i)) ' add current row to existing DelRng
Else
Set DelRng = .Rows(i)
End If
End If
Next i
End With
' delete the entire range at 1-shot
If Not DelRng Is Nothing Then DelRng.Delete
Application.ScreenUpdating = True
End Sub
答案 1 :(得分:0)
快速使用数据字段数组
在一个范围内循环并不快 - 如果你用搜索数据创建一个数据字段数组,你可以加速它(array =#34; A&#34; - 参见1)并在其中循环。如果您的数据集增长,与上面显示的字典方法相比,这会变得更快,尽管它依赖于一种良好且可靠的方法。
搜索方法
我甚至添加了第二列以及相应的行号,因此您应该可以使用它们进行进一步操作。您也可以将结果写入另一张表。
代码 - 方法演示
Sub RemoveDuplicates()
Dim t As Double: t = Timer ' stop watch
Dim ws As Worksheet ' source sheet (object)
Dim i As Long ' row index
Dim a, arr, arr2 ' variant
Dim s As String, si As String
Const SEP = "|" ' delimiter
s = SEP: si = SEP
' 0) fully qualified range reference to source sheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
' 1) write column A data to one based 2-dim data field array
a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
' 2) loop through data and check for duplicates in concatenated string s
For i = 2 To UBound(a)
If InStr(s, SEP & a(i, 1) & SEP) = 0 Then
If Len(a(i, 1)) > 0 Then
s = s & a(i, 1) & SEP
si = si & i & SEP
End If
End If
Next i
' 3) transform unique values to zero based 1-dim array
arr = Split(Mid(s, 2), SEP) ' duplicates string to array
arr2 = Split(Mid(si, 2), SEP) ' found row numbers
' 4) write result to column H2:H... ' <<< change target to wanted column
ws.Range("H:H").ClearContents '
ws.Range("H2:H" & (2 + UBound(arr))).Value = Application.Transpose(arr)
ws.Range("I2:I" & (2 + UBound(arr2))).Value = Application.Transpose(arr2)
Debug.Print UBound(arr) + 0 & " unique items found", Format(Timer - t, "0.00 seconds needed")
End Sub
=============================================== ==================
修改强>
版本2 - 包括使用唯一值覆盖原始数据
在这里,您会发现一个略有修改的版本,用35个列(A2:AI ..)覆盖具有唯一值的原始数据。
Sub RemoveDuplicates2()
' Edit: overwrite original data A2:AI{..} with unique values
Dim t As Double: t = Timer ' stop watch
Dim ws As Worksheet ' source sheet (object)
Dim i As Long ' row index
Dim a, arr, arr2 ' variant
Dim s As String, si As String
Const SEP = "|" ' delimiter
Const MyLastColumn = "AI" ' letter of last column (no 35) = "AI"
s = SEP: si = SEP
' fully qualified range reference to source sheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
' write column A data to one based 2-dim data field array
a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
' loop through data and check for duplicates in concatenated string s
For i = 2 To UBound(a) ' For i = UBound(a) To 2 Step -1
If InStr(s, SEP & a(i, 1) & SEP) = 0 Then
If Len(Trim(a(i, 1))) > 0 Then
s = s & a(i, 1) & SEP
si = si & i & SEP
End If
End If
Next i
' write unique values to zero based 1-dim array (starts with index 0; last delimiter removed in this version)
arr2 = Split(Mid(si, 2, Len(si) - 2), SEP) ' found row numbers
' overwrite original data
For i = LBound(arr2) To UBound(arr2) ' starts with index 0!
s = "A" & arr2(i) & ":" & MyLastColumn & arr2(i)
arr = ws.Range(s) ' create 1-based 1-line (2-dim) array
s = "A" & i + 2 & ":" & MyLastColumn & i + 2 ' 0 + 2 = +2 ... start in row 2
ws.Range(s) = arr ' write back unique row values
Next i
s = "A" & UBound(arr2) + 3 & ":" & MyLastColumn & UBound(a) + 1
ws.Range(s).ClearContents ' clear rest of original data
Debug.Print UBound(arr2) + 1 & " unique items found", Format(Timer - t, "0.00 seconds needed") ' result
End Sub