Excel VBA删除重复重复与筛选

时间:2015-12-02 19:15:07

标签: excel vba excel-vba

我有一个工作表,其中包含已经过滤的~8,000行。我试图从工作表列中获取一组值,并删除重复项。通过这里的帖子阅读有两种方法可以做到这一点。如果新集合中尚不存在该值,则循环访问集合并复制到新集合 或者将列中的数据复制到临时电子表格中,过滤并将数据复制到另一列,然后将其添加到集合中。

当处理大量数据时,复制过滤器具有最佳性能,但由于必须创建新工作表,因此很笨重。

我还没有看到它,但有没有办法在内存中进行复制过滤而不是创建工作表来执行此操作?

复现:

Sub GetColumnValues(Ws As Worksheet, Column As Long, CollValues As Collection)

Dim RowIndex As Long

    For RowIndex = 1 To GetLastRow(Ws)
        If CollValues.Count = 0 Then
            CollValues.Add (Ws.Cells(RowIndex, Column).Value)
        Else
            If IsInCollection(CollValues, Ws.Cells(RowIndex, Column).Value) = False Then
                CollValues.Add (Ws.Cells(RowIndex, Column).Value)
            End If
        End If
    Next RowIndex

End Sub

过滤和复制

Sub GetColumnValues(Ws As Worksheet, Column As Long, CollValues As Collection)

Dim rowLast As Long
Dim c As Range
Dim tmpWS As Worksheet
Dim tmpWsName As String

    tmpWsName = "TempWS"

    Call DeleteWs(TsWb, tmpWsName)

    Set tmpWS = TsWb.Sheets.Add()
    tmpWS.Name = tmpWsName

    rowLast = GetLastRow(Ws)

    Ws.Range(Ws.Cells(1, Column), Ws.Cells(rowLast, Column)).Copy
    tmpWS.Range("A1").PasteSpecial

    rowLast = GetLastRow(tmpWS)
    tmpWS.Range(tmpWS.Cells(1, 1), tmpWS.Cells(rowLast, 1)).AdvancedFilter _
        Action:=xlFilterCopy, _
        CopyToRange:=tmpWS.Range("B1"), _
        Unique:=True

    rowLast = GetLastRow(tmpWS)

    For Each c In tmpWS.Range(tmpWS.Cells(1, 2), tmpWS.Cells(rowLast, 2))
        If Len(c.value) > 0 Then
            CollValues.Add (c.value)
        End If
    Next c

    Call DeleteWs(TsWb, tmpWsName)
End Sub

2 个答案:

答案 0 :(得分:0)

是的,只需创建数组,然后检查数组,然后将结果发送回工作表。就个人而言,我喜欢在内存中而不是通过Application IDE。

速度更快(特别是成千上万行),您不必担心屏幕刷新,或让您的用户想知道一切都在快速移动的情况。我通常处理内存中的所有内容,将其移回,然后激活我希望用户看到的工作表。

dim set1Array() as String
dim set2Array() as String
dim set1Rows as Long
dim set2Rows as Long
dim lngX as Long
dim lngY as Long
dim blnDebug as Boolean; blnDebug = true ' flag for debugging

' get count of rows so we know how big to make the arrays
set1Rows = GetLastRow(Ws1)
set2Rows = GetLastRow(Ws2)

' set arrays to the proper size
redim set1Rows(set1Rows - 1, 1)' 1 represents 2 columns since it's 0 based. the second column is a flag for duplicated.
redim set2Rows(set2Rows - 1, 0)' 0 represents 1 column since it's 0 based

' load the arrays with the sheet data
for lngX = 1 to set1Rows
  set1Rows(lngX - 1, 0) = Worksheets("Sheet1").range("A" & lngX).Text
next lngX

for lngX = 1 to set2Rows
  set2Rows(lngX - 1, 0) = Worksheets("Sheet2").range("A" & lngX).Text
next lngX

' I like to do a debug callout here to see what I got to make sure that I am good to go with the dataset
if blnDebug then
  for lngX = 0 to Ubound(set1Rows)
    debug.print "set1Rows(" & lngX & ") - col1: " & set1Rows(lngX, 0)
  next lngX

  for lngX = 0 to Ubound(set2Rows)
    debug.print "set2Rows(" & lngX & ") - col1: " & set2Rows(lngX, 0)
  next lngX

end if

' now do your comparison

for lngX = 0 to Ubound(set1Rows)
  for lngY = 0 to Ubound(set2Rows)
    if set1Rows(lngX, 0) = set2Rows(lngY, 0) then
      set1Rows(lngX, 1) = "1"
    end if
  next lngY
next lngX

' now your duplicates are flagged in the set1Rows array

for lngX = 0 to Ubound(set1Rows)
  if set1Rows(lngX, 1) = "1" then
    ' code for duplicated
  else
    ' code for unique
  end if
next lngX

答案 1 :(得分:0)

我不知道为什么它必须是一个集合,但为了快速获得一个没有双倍(过滤列表)的所有值的数组,你可以这样做:(非常接近你的第一个例子)< / p>

Function GetColVal(Ws As Worksheet, Column As Long) As Variant
  Dim runner As Variant, outputVal() As Variant, i As Long
  ReDim outputVal(Ws.Range(Ws.Cells(1, Column), Ws.Cells(GetLastRow(Ws), Column)).SpecialCells(xlCellTypeVisible).Count)
  For Each runner In Ws.Range(Ws.Cells(1, Column), Ws.Cells(GetLastRow(Ws), Column)).SpecialCells(xlCellTypeVisible)
    If i = 0 Then
      outputVal(0) = runner.Value: i = 1
    Else
      If IsError(Application.Match(runner.Value, outputVal, 0)) Then outputVal(i) = runner.Value: i = i + 1
    End If
  Next
  ReDim Preserve outputVal(i - 1)
  GetColVal= outputVal
End Function

Application.Match是VBA中最快的函数之一,而IsInCollection可能非常慢...更好地运行For Each ...循环来添加集合中的所有内容而不是检查集合。 ..

Dim a As Variant
For Each a in GetColVal(Worksheets("SheetX"),7)
  MyCollection.Add a
Next

应该比你的例子快得多......我仍然建议不要使用集合,特别是如果你只是使用值...如果可能的话,最好直接使用GetColVal - 数组...
variantVariable = GetColVal(Worksheets("SheetX"),7)然后将变量变量用于您想要做的任何事情(您也可以将其直接粘贴到工作表中的某个位置)

输入工作表的简单输出就是:

Dim a As Variant
a = GetColVal(Worksheets("Sheet1"),13) 'values from sheet1 column M
'pasting in one row (starting at A1 in Sheet2)
ThisWorkbook.Sheets("Sheet2").Range(Cells(1, 1), Cells(1, ubound(a) + 1)).value = a
'pasting in one column (starting at C5 in Sheet4)
ThisWorkbook.Sheets("Sheet4").Range(Cells(5, 3), Cells(ubound(a) + 5, 3)).value = Application.Transpose(a)

修改

展示不同的东西:

Function GetColumnValues(ws As Worksheet, Column As Long) As Range
  With ws
    Dim srcRng As Range, outRng As Range, runRng1 As Range, runRng2 As Range, dBool As Boolean
    Set srcRng = .Range(.Cells(1, Column), .Cells(GetLastRow(ws), Column)).SpecialCells(xlCellTypeVisible)
    For Each runRng1 In a
      If outRng Is Nothing Then Set outRng = runRng1
      For Each runRng2 In outRng
        If Intersect(runRng1, runRng2) Is Nothing Then
          If runRng2.Value = runRng1.Value Then dBool = True: Exit For
        End If
      Next
      If dBool Then dBool = False Else Set outRng = Union(outRng, runRng1)
    Next
  End With
  Set GetColumnValues = outRng
End Function

使用此功能,您将获得一系列可以选择或复制到另一个位置的单元格(使用格式化和其他所有单元格)。您仍然可以使用For Each ...将所有元素添加到集合中。我也没有使用Match来避免&#34; Len&gt; 255&#34; -error