我正在使用相当大的数据集(> 100,000行)并尝试比较两个列表,以确定新列表中的哪些项目尚未在主列表中。换句话说,我想找到新的独特物品。
我有一些使用vlookup的VBA代码和可以工作的数组,但是当数组太大(~70,000)时会爆炸。所以我转向收藏品。但是,我在使用vlookup或匹配搜索集合时遇到困难。
Sub find_uniqueIDs()
Dim a As Long
Dim n As Long
Dim m As Variant
Dim oldnum As Long
Dim oldIDs As Variant
Dim oldcoll As New Collection
Dim newnum As Long
Dim newIDs As Variant
Dim newcoll As New Collection
oldnum = 75000
oldIDs = Range("A1", Range("A" & oldnum))
newnum = 45000 + 3
newIDs = Range("G3", Range("G" & newnum))
'Using arrays to search, but bombs out when oldnum or newnum are ~70000
For n = 1 To newnum - 3
m = Application.VLookup(newIDs(n, 1), oldIDs, 1, False)
If IsError(m) Then Range("E100000").End(xlUp).Offset(1, 0) = newIDs(n, 1)
Next n
'Using collections to search
For n = 1 To oldnum
On Error Resume Next
oldcoll.Add oldIDs(n, 1)
On Error GoTo 0
Next n
For m = 1 To newnum
On Error Resume Next
newcoll.Add newIDs(m, 1)
On Error GoTo 0
Next m
'This bit of code doesn't work
For a = 1 To newcoll.Count
If Application.VLookup(newcoll(a), oldcoll, 1, False) = "#N/A" Then _
Range("E100000").End(xlUp).Offset(1, 0) = newcoll(a)
Next a
End Sub
如何使用集合确定特定项目是否在主列表中?
答案 0 :(得分:1)
这是一个简短的子程序,演示了一些脚本字典方法。
Sub list_New_Unique()
Dim dMASTER As Object, dNEW As Object, k As Variant
Dim v As Long, vVALs() As Variant, vNEWs() As Variant
Debug.Print "Start: " & Timer
Set dMASTER = CreateObject("Scripting.Dictionary")
Set dNEW = CreateObject("Scripting.Dictionary")
dMASTER.comparemode = vbTextCompare
dNEW.comparemode = vbTextCompare
With Worksheets("Sheet7")
vVALs = .Range("A2:A100000").Value2
vNEWs = .Range("C2:C100000").Value2
End With
'populate the dMASTER values
For v = LBound(vVALs, 1) To UBound(vVALs, 1)
dMASTER.Add Key:=vVALs(v, 1), Item:=vVALs(v, 1)
Next v
'only populate dNEW with items not found in dMASTER
For v = LBound(vNEWs, 1) To UBound(vNEWs, 1)
If Not dMASTER.exists(vNEWs(v, 1)) Then
If Not dNEW.exists(vNEWs(v, 1)) Then _
dNEW.Add Key:=vNEWs(v, 1), Item:=vNEWs(v, 1)
End If
Next v
Debug.Print dNEW.Count
For Each k In dNEW.keys
'Debug.Print k
Next k
Debug.Print "End: " & Timer
dNEW.RemoveAll: Set dNEW = Nothing
dMASTER.RemoveAll: Set dMASTER = Nothing
End Sub
A2中有99,999个唯一条目:A100000和C2中的89747个随机条目:C89747,这发现在A2:A100000中找不到70,087个独特的新条目,时间为9.87秒。
答案 1 :(得分:0)
VLookup
是worksheet function,而不是常规的VBA函数,因此it's for searching in Range
s, not Collection
s.
语法:
VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])
[...]
table_array
(必填): 单元格范围,其中VLOOKUP将搜索lookup_value和返回值。
为了搜索其他VBA数据结构,如数组,集合等,您必须找出其他方法,并可能手动实现。
答案 2 :(得分:0)
我会这样做:
Sub test()
Dim newRow As Long, oldRow As Long
Dim x As Long, Dim y As Long
Dim checker As Boolean
With ActiveSheet
newRow = .Cells(.Rows.Count,7).End(xlUp).Row
oldRow = .Cells(.Rows.Count,1).End(xlUp).Row
checker = True
for y = 1 To oldRow
for x = 1 To newRow
If .Cells(y,1).Value = .Cells(x,7).Value Then
checker = False
Exit For
End If
Next
If checker Then
Range("E10000").End(xlUp).Offset(1,0).Value = .Cells(y,1).Value
End If
checker = True
Next
End With
End Sub
答案 3 :(得分:0)
虽然@Jeeped建议使用Scripting.Dictionary对象可能是最好的,但您也可以尝试使用应用于数组的Filter()
function。