Question

目前使用此代码，我有一大堆数据，而且运行速度非常慢。我需要删除任何重复的信息，并保留最高的信息行。

dim dup as variant, r as long, lncheckduplicatescolumn as long
With wb_DST.Sheets(sWs_DST)
     lncheckduplicatescolumn = .Cells(.Rows.Count, "A").End(xlUp).row
     for r = lncheckduplicatescolumn  to 2 step -1
         dup = application.match(.cells(r, "A").value, .columns(1), 0)
             if dup < r then .rows(dup).delete
     next r
end with

数据：

       Column A              Column B
          A                      1
          B                      2
          C                      3
          A                      3

结果应该是：

           B                     2
           C                     3
           A                     3

A列中的数据顺序无关紧要，只要它是唯一的，并保留较高行号中的信息。虽然我共享的代码有效，但对于大型数据集来说它太慢了。

Answer 1

另一种快速方法是使用Dictionary对象。您可以检查列{A}中的任何值是否已存在于Dictionary中。如果他们这样做（意味着它是重复的），那么不要每次都删除它们，这为代码的运行时间增加了很长时间。相反，您可以使用DelRng对象，Range使用Union来合并多个重复的行。

稍后，您可以使用DelRng.Delete一次删除整个ducplicates范围。

<强> 代码

Option Explicit

Sub RemoveDuplicatesUsingDict()

Dim wb_DST As Workbook
Dim sWs_DST As String

'  Dictionary variables
Dim Dict As Object
Dim DictIndex As Long, ExistIndex As Long

Dim DelRng As Range
Dim LastRow As Long, i As Long

' --- parameters for my internal testing ---
Set wb_DST = ThisWorkbook
sWs_DST = "Sheet1"

Application.ScreenUpdating = False

Set Dict = CreateObject("Scripting.Dictionary")

With wb_DST.Sheets(sWs_DST)
     LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row ' get last row with data in column "A"

     For i = LastRow To 2 Step -1
        If Not Dict.exists(.Range("A" & i).Value) Then ' value doesn't exists yet in Dictionary >> add this Key
            Dict.Add .Range("A" & i).Value, .Range("A" & i).Value
        Else ' value already exists in Dictionary >> add it to DelRng (later will delete the entire range)
            If Not DelRng Is Nothing Then
                Set DelRng = Application.Union(DelRng, .Rows(i)) ' add current row to existing DelRng
            Else
                Set DelRng = .Rows(i)
            End If
        End If
     Next i
End With

' delete the entire range at 1-shot
If Not DelRng Is Nothing Then DelRng.Delete

Application.ScreenUpdating = True

End Sub

Answer 2

快速使用数据字段数组

在一个范围内循环并不快 - 如果你用搜索数据创建一个数据字段数组，你可以加速它（array =＃34; A＆＃34; - 参见1）并在其中循环。如果您的数据集增长，与上面显示的字典方法相比，这会变得更快，尽管它依赖于一种良好且可靠的方法。

搜索方法

根据已找到唯一值的连接搜索字符串检查任何数组值，如果尚未包含，则添加（参见2）
完成的字符串转换为数组并写回给定的目标列（例如＆＃34; H＆＃34;） - 参见3）和4）

我甚至添加了第二列以及相应的行号，因此您应该可以使用它们进行进一步操作。您也可以将结果写入另一张表。

代码 - 方法演示

Sub RemoveDuplicates()
Dim t    As Double: t = Timer   ' stop watch
Dim ws   As Worksheet           ' source sheet (object)
Dim i    As Long                ' row index
Dim a, arr, arr2                ' variant
Dim s    As String, si As String
Const SEP = "|"                 ' delimiter
s = SEP: si = SEP
' 0) fully qualified range reference to source sheet
  Set ws = ThisWorkbook.Worksheets("Sheet1")
' 1) write column A data to one based 2-dim data field array
  a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row)
' 2) loop through data and check for duplicates in concatenated string s
  For i = 2 To UBound(a)
     If InStr(s, SEP & a(i, 1) & SEP) = 0 Then
        If Len(a(i, 1)) > 0 Then
           s = s & a(i, 1) & SEP
           si = si & i & SEP
        End If
     End If
  Next i
' 3) transform unique values to zero based 1-dim array
  arr = Split(Mid(s, 2), SEP)          ' duplicates string to array
  arr2 = Split(Mid(si, 2), SEP)        ' found row numbers
' 4) write result to column H2:H...      ' <<< change target to wanted column
  ws.Range("H:H").ClearContents         '
  ws.Range("H2:H" & (2 + UBound(arr))).Value = Application.Transpose(arr)
  ws.Range("I2:I" & (2 + UBound(arr2))).Value = Application.Transpose(arr2)
  Debug.Print UBound(arr) + 0 & " unique items found", Format(Timer - t, "0.00 seconds needed")

End Sub

=============================================== ==================

修改

版本2 - 包括使用唯一值覆盖原始数据

在这里，您会发现一个略有修改的版本，用35个列（A2：AI ..）覆盖具有唯一值的原始数据。

Sub RemoveDuplicates2() ' Edit: overwrite original data A2:AI{..} with unique values Dim t As Double: t = Timer ' stop watch Dim ws As Worksheet ' source sheet (object) Dim i As Long ' row index Dim a, arr, arr2 ' variant Dim s As String, si As String Const SEP = "|" ' delimiter Const MyLastColumn = "AI" ' letter of last column (no 35) = "AI" s = SEP: si = SEP ' fully qualified range reference to source sheet Set ws = ThisWorkbook.Worksheets("Sheet1") ' write column A data to one based 2-dim data field array a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row) ' loop through data and check for duplicates in concatenated string s For i = 2 To UBound(a) ' For i = UBound(a) To 2 Step -1 If InStr(s, SEP & a(i, 1) & SEP) = 0 Then If Len(Trim(a(i, 1))) > 0 Then s = s & a(i, 1) & SEP si = si & i & SEP End If End If Next i ' write unique values to zero based 1-dim array (starts with index 0; last delimiter removed in this version) arr2 = Split(Mid(si, 2, Len(si) - 2), SEP) ' found row numbers ' overwrite original data For i = LBound(arr2) To UBound(arr2) ' starts with index 0! s = "A" & arr2(i) & ":" & MyLastColumn & arr2(i) arr = ws.Range(s) ' create 1-based 1-line (2-dim) array s = "A" & i + 2 & ":" & MyLastColumn & i + 2 ' 0 + 2 = +2 ... start in row 2 ws.Range(s) = arr ' write back unique row values Next i s = "A" & UBound(arr2) + 3 & ":" & MyLastColumn & UBound(a) + 1 ws.Range(s).ClearContents ' clear rest of original data Debug.Print UBound(arr2) + 1 & " unique items found", Format(Timer - t, "0.00 seconds needed") ' result End Sub

VBA删除更快地复制代码

2 个答案: