Excel宏阵列

时间:2012-07-12 17:01:43

标签: arrays excel vba excel-vba

目前我有一个宏通过列表运行并删除重复值(在一列中),但事实证明它非常低效。对于检查重复项的每个条目,它必须遍历整个列;我的文件目前有50,000个条目,这不是一件小事。

我认为宏工作的一种更简单的方法是宏检查此值是否在数组中。如果是,则删除该条目所在的行。如果不是,则将该值添加到数组中。

有人可以为宏的基本大纲提供一些帮助吗?感谢

4 个答案:

答案 0 :(得分:3)

下面的代码将遍历您的源数据并将其存储在一个数组中,同时检查重复项。收集完成后,它使用数组作为密钥来了解要删除的列。

由于删除的电极屏幕更新次数很多,请务必关闭屏幕更新。 (附带)

Sub Example()
    Application.ScreenUpdating = false
    Dim i As Long
    Dim k As Long
    Dim StorageArray() As String
    Dim iLastRow As Long
    iLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    ReDim StorageArray(1 To iLastRow, 0 To 1)

    'loop through column from row 1 to the last row
    For i = 1 To iLastRow
        'add each sheet value to the first column of the array
        StorageArray(i, 0) = ActiveSheet.Range("A" & i).Value
        '- keep the second column as 0 by default
        StorageArray(i, 1) = 0
        '- as each item is added, loop through previously added items to see if its a duplicate
        For k = 1 To i-1
            If StorageArray(k, 0) = StorageArray(i, 0) Then
                'if it is a duplicate set the second column of the srray to 1
                StorageArray(i, 1) = 1
                Exit For
            End If
        Next k
    Next i

    'loop through sheet backwords and delete rows that were maked for deletion
    For i = iLastRow To 1 Step -1
        If StorageArray(i, 1) = 1 Then
            ActiveSheet.Range("A" & i).EntireRow.Delete
        End If
    Next i

    Application.ScreenUpdating = true
End Sub

根据要求,这是一种类似的方法,使用Collections而不是Array进行键索引:(RBarryYoung)

Public Sub RemovecolumnDuplicates()
    Dim prev as Boolean
    prev = Application.ScreenUpdating
    Application.ScreenUpdating = false
    Dim i As Long, k As Long

    Dim v as Variant, sv as String
    Dim cl as Range, ws As Worksheet
    Set ws = ActiveWorksheet    'NOTE: This really should be a parameter ...

    Dim StorageArray As New Collection
    Dim iLastRow As Long
    iLastRow = ws.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    'loop through column from row 1 to the last row
    i = 1
    For k = 1 To iLastRow
        'add each sheet value to the collection
        Set cl = ws.Cells(i, 1)
        v = cl.Value
        sv = Cstr(v)

        On Error Resume Next
            StorageArray.Add v, sv
        If Err.Number <> 0 Then
            'must be a duplicate, remove it
            cl.EntireRow.Delete
            'Note: our index doesn't change here, since all of the rows moved
        Else
            'not a duplicate, so go to the next row
            i = i + 1
        End If
    Next k

    Application.ScreenUpdating = prev
End Sub

请注意,此方法不需要假定列中单元格值的任何数据类型或整数限制。

(Mea Culpa:我必须在记事本中手动输入,因为我的Excel现在忙于运行项目测试。所以可能有一些拼写/语法错误......)

答案 1 :(得分:1)

这是我的评论的后续内容。 循环50k记录 + 循环数组将成为这种简单操作的过度杀戮。

就像我在评论中提到的那样,将数组中的值复制到新工作表中。然后在50k条目旁边插入一个空白列,并执行VlookupCountIf。完成后,执行自动过滤,然后在1中删除重复的条目。让我们举一个例子来看看它是如何工作的。

假设我们有一个包含1000个项目的数组?在1张纸上我们有50k数据。以下代码将使用1000 items in Array50k Data参见快照

进行测试

enter image description here

将此代码粘贴到模块中(代码耗时少于5秒才能完成

enter image description here

Sub Sample()
    Dim ws As Worksheet, wstemp As Worksheet
    Dim LRow As Long
    Dim Ar(1 To 1000) As Long
    Dim startTime As String, EndTime As String

    startTime = Format(Now, "hh:mm:ss")

    Set ws = Sheets("Sheet1")
    Set wstemp = Sheets.Add

    '~~> Creating a dummy array
    For i = 1 To 1000
        Ar(i) = i
    Next i

    '~~> Copy it to the new sheet
    wstemp.Range("A1:A1000").Value = Application.Transpose(Ar)

    With ws
        LRow = .Range("A" & .Rows.Count).End(xlUp).Row

        .Columns(2).Insert Shift:=xlToRight
        .Range("B1").Value = "For Deletion"
        .Range("B2:B" & LRow).FormulaR1C1 = "=COUNTIF(" & wstemp.Name & "!C[-1],RC[-1])"
        .Columns(2).Value = .Columns(2).Value

        '~~> Remove any filters
        .AutoFilterMode = False

        '~~> Filter, offset(to exclude headers) and delete visible rows
        With .Range("B1:B" & LRow)
            .AutoFilter Field:=1, Criteria1:="<>0"
            .Offset(1, 0).SpecialCells(xlCellTypeVisible).EntireRow.Delete
        End With

        '~~> Remove any filters
        .AutoFilterMode = False

        .Columns(2).Delete
    End With

    EndTime = Format(Now, "hh:mm:ss")

    MsgBox "The process started at " & startTime & " and finished at" & EndTime
End Sub

答案 2 :(得分:1)

对于Excel 2007及更高版本:将数组复制到工作表并使用removeduplicates方法:

set ws = worksheets.add
ws.[A1].resize(ubound(yourarray,1),ubound(yourarray,2)).value = yourarray
ws.usedrange.removeduplicates columns:=1, header:=no

这假定数组的下限为1,要删除的列是第1列,而列表中没有标题。然后,您可以找到新范围的边框并将其读回到阵列中(首先擦除当前阵列)。

答案 3 :(得分:0)

我建议填充您的列,然后使用公式查找重复项并删除它们。我没有你的实际代码(你没有给我们任何代码)

dim a as range
dim b as range
set a = Range ("A1")

Do while Not isEmpty(A)
Set b = a.offset(1,0)

If b = a then
b= ""
else a.offset (1,0)

Loop

我确信您可以将过滤器放在代码中,或者只是在运行宏之前重新填充。