搜索馆藏

时间:2015-11-11 13:05:18

标签: excel vba excel-vba

我正在使用相当大的数据集(> 100,000行)并尝试比较两个列表,以确定新列表中的哪些项目尚未在主列表中。换句话说,我想找到新的独特物品。

我有一些使用vlookup的VBA代码和可以工作的数组,但是当数组太大(~70,000)时会爆炸。所以我转向收藏品。但是,我在使用vlookup或匹配搜索集合时遇到困难。

Sub find_uniqueIDs()

Dim a As Long
Dim n As Long
Dim m As Variant
Dim oldnum As Long
Dim oldIDs As Variant
Dim oldcoll As New Collection
Dim newnum As Long
Dim newIDs As Variant
Dim newcoll As New Collection

oldnum = 75000
oldIDs = Range("A1", Range("A" & oldnum))
newnum = 45000 + 3
newIDs = Range("G3", Range("G" & newnum))

'Using arrays to search, but bombs out when oldnum or newnum are ~70000
For n = 1 To newnum - 3
    m = Application.VLookup(newIDs(n, 1), oldIDs, 1, False)
    If IsError(m) Then Range("E100000").End(xlUp).Offset(1, 0) = newIDs(n, 1)
Next n

'Using collections to search
For n = 1 To oldnum
On Error Resume Next
    oldcoll.Add oldIDs(n, 1)
On Error GoTo 0
Next n

For m = 1 To newnum
On Error Resume Next
    newcoll.Add newIDs(m, 1)
On Error GoTo 0
Next m

'This bit of code doesn't work
For a = 1 To newcoll.Count
If Application.VLookup(newcoll(a), oldcoll, 1, False) = "#N/A" Then _
    Range("E100000").End(xlUp).Offset(1, 0) = newcoll(a)
Next a

End Sub

如何使用集合确定特定项目是否在主列表中?

4 个答案:

答案 0 :(得分:1)

这是一个简短的子程序,演示了一些脚本字典方法。

Sub list_New_Unique()
    Dim dMASTER As Object, dNEW As Object, k As Variant
    Dim v As Long, vVALs() As Variant, vNEWs() As Variant

    Debug.Print "Start: " & Timer

    Set dMASTER = CreateObject("Scripting.Dictionary")
    Set dNEW = CreateObject("Scripting.Dictionary")
    dMASTER.comparemode = vbTextCompare
    dNEW.comparemode = vbTextCompare

    With Worksheets("Sheet7")
        vVALs = .Range("A2:A100000").Value2
        vNEWs = .Range("C2:C100000").Value2
    End With

    'populate the dMASTER values
    For v = LBound(vVALs, 1) To UBound(vVALs, 1)
        dMASTER.Add Key:=vVALs(v, 1), Item:=vVALs(v, 1)
    Next v

    'only populate dNEW with items not found in dMASTER
    For v = LBound(vNEWs, 1) To UBound(vNEWs, 1)
        If Not dMASTER.exists(vNEWs(v, 1)) Then
            If Not dNEW.exists(vNEWs(v, 1)) Then _
                dNEW.Add Key:=vNEWs(v, 1), Item:=vNEWs(v, 1)
        End If
    Next v

    Debug.Print dNEW.Count

    For Each k In dNEW.keys
        'Debug.Print k
    Next k

    Debug.Print "End: " & Timer

    dNEW.RemoveAll: Set dNEW = Nothing
    dMASTER.RemoveAll: Set dMASTER = Nothing
End Sub

A2中有99,999个唯一条目:A100000和C2中的89747个随机条目:C89747,这发现在A2:A100000中找不到70,087个独特的新条目,时间为9.87秒。

答案 1 :(得分:0)

VLookupworksheet function,而不是常规的VBA函数,因此it's for searching in Ranges, not Collections.

  

语法: VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])

     

[...]

     

table_array(必填): 单元格范围,其中VLOOKUP将搜索lookup_value和返回值。

为了搜索其他VBA数据结构,如数组,集合等,您必须找出其他方法,并可能手动实现。

答案 2 :(得分:0)

我会这样做:

Sub test()

Dim newRow As Long, oldRow As Long
Dim x As Long, Dim y As Long
Dim checker As Boolean

With ActiveSheet

newRow = .Cells(.Rows.Count,7).End(xlUp).Row
oldRow = .Cells(.Rows.Count,1).End(xlUp).Row
checker = True

for y = 1 To oldRow

    for x = 1 To newRow

    If .Cells(y,1).Value = .Cells(x,7).Value Then

    checker = False

    Exit For

    End If

    Next

If checker Then

Range("E10000").End(xlUp).Offset(1,0).Value = .Cells(y,1).Value

End If

checker = True

Next

End With

End Sub

答案 3 :(得分:0)

虽然@Jeeped建议使用Scripting.Dictionary对象可能是最好的,但您也可以尝试使用应用于数组的Filter() function