vba中是否有内置功能以从一维数组中获取唯一值?怎么样才能摆脱重复?
如果没有,那么我如何从数组中获取唯一值?
答案 0 :(得分:51)
This post包含2个示例。我喜欢第二个:
Sub unique()
Dim arr As New Collection, a
Dim aFirstArray() As Variant
Dim i As Long
aFirstArray() = Array("Banana", "Apple", "Orange", "Tomato", "Apple", _
"Lemon", "Lime", "Lime", "Apple")
On Error Resume Next
For Each a In aFirstArray
arr.Add a, a
Next
For i = 1 To arr.Count
Cells(i, 1) = arr(i)
Next
End Sub
答案 1 :(得分:38)
没有内置功能可以从数组中删除重复项。 Raj的回答似乎很优雅,但我更喜欢使用词典。
Dim d As Object
Set d = CreateObject("Scripting.Dictionary")
'Set d = New Scripting.Dictionary
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
d(myArray(i)) = 1
Next i
Dim v As Variant
For Each v In d.Keys()
'd.Keys() is a Variant array of the unique values in myArray.
'v will iterate through each of them.
Next v
编辑:根据Tomalak建议的答案,我将循环更改为使用LBound
和UBound
。
编辑:d.Keys()
是一个Variant数组,而不是Collection。
答案 2 :(得分:17)
我已经创建了更全面的基准测试。首先,正如@ChaimG指出的那样,早期绑定会产生很大的不同(我最初使用的是@ eksortso上面的代码,它使用了后期绑定)。其次,我原来的基准测试只包括创建唯一对象的时间,但是,它没有测试使用对象的效率。我这样做的意思是,如果我创建的对象非常快,如果我创建的对象很笨拙并且让我向前移动,那么这并不重要。
旧备注:事实证明,循环收集对象的效率非常低
事实证明,如果你知道如何做到这一点,那么循环收集可以非常有效(我没有)。正如@ChaimG(再次),在评论中指出,使用For Each
构造比简单地使用For
循环要好得多。为了给你一个想法,在更改循环结构之前,Collection2
Test Case Size = 10^6
的时间超过了1400(即~23分钟)。它现在只有0.195秒(超过7000倍)。
对于Collection
方法,有两次。第一个(我的原始基准Collection1
)显示创建唯一对象的时间。第二部分(Collection2
)显示了在对象上循环的时间(非常自然),以便像其他函数一样创建可返回数组。
在下面的图表中,黄色背景表示该测试用例最快,红色表示最慢("未测试"算法被排除)。 Collection
方法的总时间是Collection1
和Collection2
的总和。绿松石表示无论原始订单如何都是最快的。
下面是我创建的原始算法(我稍微修改了它,例如我不再实例化我自己的数据类型)。它在非常可观的时间内返回具有原始顺序的数组的唯一值,并且可以对其进行修改以采用任何数据类型。在IndexMethod
之外,它是非常大的数组的最快算法。
以下是此算法背后的主要观点:
以下是一个例子:
Let myArray = (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
1. (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
(1 , 2, 3, 4, 5, 6, 7, 8, 9, 10) <<-- Indexing
2. (19, 19, 19, 33, 33, 86, 100, 100, 703, 703) <<-- sort by values
(4, 7, 10, 3, 5, 1, 2, 8, 6, 9)
3. (19, 33, 86, 100, 703) <<-- remove duplicates
(4, 3, 1, 2, 6)
4. (86, 100, 33, 19, 703)
( 1, 2, 3, 4, 6) <<-- sort by index
以下是代码:
Function SortingUniqueTest(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
Dim MyUniqueArr() As Long, i As Long, intInd As Integer
Dim StrtTime As Double, Endtime As Double, HighB As Long, LowB As Long
LowB = LBound(myArray): HighB = UBound(myArray)
ReDim MyUniqueArr(1 To 2, LowB To HighB)
intInd = 1 - LowB 'Guarantees the indices span 1 to Lim
For i = LowB To HighB
MyUniqueArr(1, i) = myArray(i)
MyUniqueArr(2, i) = i + intInd
Next i
QSLong2D MyUniqueArr, 1, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
Call UniqueArray2D(MyUniqueArr)
If bOrigIndex Then QSLong2D MyUniqueArr, 2, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
SortingUniqueTest = MyUniqueArr()
End Function
Public Sub UniqueArray2D(ByRef myArray() As Long)
Dim i As Long, j As Long, Count As Long, Count1 As Long, DuplicateArr() As Long
Dim lngTemp As Long, HighB As Long, LowB As Long
LowB = LBound(myArray, 2): Count = LowB: i = LowB: HighB = UBound(myArray, 2)
Do While i < HighB
j = i + 1
If myArray(1, i) = myArray(1, j) Then
Do While myArray(1, i) = myArray(1, j)
ReDim Preserve DuplicateArr(1 To Count)
DuplicateArr(Count) = j
Count = Count + 1
j = j + 1
If j > HighB Then Exit Do
Loop
QSLong2D myArray, 2, i, j - 1, 2
End If
i = j
Loop
Count1 = HighB
If Count > 1 Then
For i = UBound(DuplicateArr) To LBound(DuplicateArr) Step -1
myArray(1, DuplicateArr(i)) = myArray(1, Count1)
myArray(2, DuplicateArr(i)) = myArray(2, Count1)
Count1 = Count1 - 1
ReDim Preserve myArray(1 To 2, LowB To Count1)
Next i
End If
End Sub
这是我使用的排序算法(更多关于此算法here)。
Sub QSLong2D(ByRef saArray() As Long, bytDim As Byte, lLow1 As Long, lHigh1 As Long, bytNum As Byte)
Dim lLow2 As Long, lHigh2 As Long
Dim sKey As Long, sSwap As Long, i As Byte
On Error GoTo ErrorExit
If IsMissing(lLow1) Then lLow1 = LBound(saArray, bytDim)
If IsMissing(lHigh1) Then lHigh1 = UBound(saArray, bytDim)
lLow2 = lLow1
lHigh2 = lHigh1
sKey = saArray(bytDim, (lLow1 + lHigh1) \ 2)
Do While lLow2 < lHigh2
Do While saArray(bytDim, lLow2) < sKey And lLow2 < lHigh1: lLow2 = lLow2 + 1: Loop
Do While saArray(bytDim, lHigh2) > sKey And lHigh2 > lLow1: lHigh2 = lHigh2 - 1: Loop
If lLow2 < lHigh2 Then
For i = 1 To bytNum
sSwap = saArray(i, lLow2)
saArray(i, lLow2) = saArray(i, lHigh2)
saArray(i, lHigh2) = sSwap
Next i
End If
If lLow2 <= lHigh2 Then
lLow2 = lLow2 + 1
lHigh2 = lHigh2 - 1
End If
Loop
If lHigh2 > lLow1 Then QSLong2D saArray(), bytDim, lLow1, lHigh2, bytNum
If lLow2 < lHigh1 Then QSLong2D saArray(), bytDim, lLow2, lHigh1, bytNum
ErrorExit:
End Sub
如果您的数据包含整数,下面是一个特殊的算法。它使用索引和布尔数据类型。
Function IndexSort(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
'' Modified to take both positive and negative integers
Dim arrVals() As Long, arrSort() As Long, arrBool() As Boolean
Dim i As Long, HighB As Long, myMax As Long, myMin As Long, OffSet As Long
Dim LowB As Long, myIndex As Long, count As Long, myRange As Long
HighB = UBound(myArray)
LowB = LBound(myArray)
For i = LowB To HighB
If myArray(i) > myMax Then myMax = myArray(i)
If myArray(i) < myMin Then myMin = myArray(i)
Next i
OffSet = Abs(myMin) '' Number that will be added to every element
'' to guarantee every index is non-negative
If myMax > 0 Then
myRange = myMax + OffSet '' E.g. if myMax = 10 & myMin = -2, then myRange = 12
Else
myRange = OffSet
End If
If bOrigIndex Then
ReDim arrSort(1 To 2, 1 To HighB)
ReDim arrVals(1 To 2, 0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(1, myIndex) = myArray(i)
If arrVals(2, myIndex) = 0 Then arrVals(2, myIndex) = i
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(1, count) = arrVals(1, i)
arrSort(2, count) = arrVals(2, i)
End If
Next i
QSLong2D arrSort, 2, 1, count, 2
ReDim Preserve arrSort(1 To 2, 1 To count)
Else
ReDim arrSort(1 To HighB)
ReDim arrVals(0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(myIndex) = myArray(i)
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(count) = arrVals(i)
End If
Next i
ReDim Preserve arrSort(1 To count)
End If
ReDim arrVals(0)
ReDim arrBool(0)
IndexSort = arrSort
End Function
以下是Collection(by @DocBrown)和Dictionary(by @eksortso)函数。
Function CollectionTest(ByRef arrIn() As Long, Lim As Long) As Variant
Dim arr As New Collection, a, i As Long, arrOut() As Variant, aFirstArray As Variant
Dim StrtTime As Double, EndTime1 As Double, EndTime2 As Double, count As Long
On Error Resume Next
ReDim arrOut(1 To UBound(arrIn))
ReDim aFirstArray(1 To UBound(arrIn))
StrtTime = Timer
For i = 1 To UBound(arrIn): aFirstArray(i) = CStr(arrIn(i)): Next i '' Convert to string
For Each a In aFirstArray ''' This part is actually creating the unique set
arr.Add a, a
Next
EndTime1 = Timer - StrtTime
StrtTime = Timer ''' This part is writing back to an array for return
For Each a In arr: count = count + 1: arrOut(count) = a: Next a
EndTime2 = Timer - StrtTime
CollectionTest = Array(arrOut, EndTime1, EndTime2)
End Function
Function DictionaryTest(ByRef myArray() As Long, Lim As Long) As Variant
Dim StrtTime As Double, Endtime As Double
Dim d As Scripting.Dictionary, i As Long '' Early Binding
Set d = New Scripting.Dictionary
For i = LBound(myArray) To UBound(myArray): d(myArray(i)) = 1: Next i
DictionaryTest = d.Keys()
End Function
以下是@IsraelHoletz提供的直接方法。
Function ArrayUnique(ByRef aArrayIn() As Long) As Variant
Dim aArrayOut() As Variant, bFlag As Boolean, vIn As Variant, vOut As Variant
Dim i As Long, j As Long, k As Long
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
Function DirectTest(ByRef aArray() As Long, Lim As Long) As Variant
Dim aReturn() As Variant
Dim StrtTime As Long, Endtime As Long, i As Long
aReturn = ArrayUnique(aArray)
DirectTest = aReturn
End Function
这是比较所有功能的基准功能。您应该注意,由于内存问题,最后两种情况的处理方式略有不同。另请注意,我没有测试Collection
的{{1}}方法。出于某种原因,它返回了不正确的结果并且行为异常(我猜测收集对象对你可以放入多少东西有限制。我搜索过,我无法找到任何关于此的文献)
Test Case Size = 10,000,000
最后,这是生成上表的子。
Function UltimateTest(Lim As Long, bTestDirect As Boolean, bTestDictionary, bytCase As Byte) As Variant
Dim dictionTest, collectTest, sortingTest1, indexTest1, directT '' all variants
Dim arrTest() As Long, i As Long, bEquality As Boolean, SizeUnique As Long
Dim myArray() As Long, StrtTime As Double, EndTime1 As Variant
Dim EndTime2 As Double, EndTime3 As Variant, EndTime4 As Double
Dim EndTime5 As Double, EndTime6 As Double, sortingTest2, indexTest2
ReDim myArray(1 To Lim): Rnd (-2) '' If you want to test negative numbers,
'' insert this to the left of CLng(Int(Lim... : (-1) ^ (Int(2 * Rnd())) *
For i = LBound(myArray) To UBound(myArray): myArray(i) = CLng(Int(Lim * Rnd() + 1)): Next i
arrTest = myArray
If bytCase = 1 Then
If bTestDictionary Then
StrtTime = Timer: dictionTest = DictionaryTest(arrTest, Lim): EndTime1 = Timer - StrtTime
Else
EndTime1 = "Not Tested"
End If
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
If bTestDirect Then
arrTest = myArray: StrtTime = Timer: directT = DirectTest(arrTest, Lim): EndTime3 = Timer - StrtTime
Else
EndTime3 = "Not Tested"
End If
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
bEquality = True
For i = LBound(sortingTest1, 2) To UBound(sortingTest1, 2)
If Not CLng(collectTest(0)(i)) = sortingTest1(1, i) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = sortingTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = indexTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
If bTestDirect Then
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = directT(i + 1) Then
bEquality = False
Exit For
End If
Next i
End If
UltimateTest = Array(bEquality, EndTime1, EndTime2, EndTime3, EndTime4, _
EndTime5, EndTime6, collectTest(1), collectTest(2), SizeUnique)
ElseIf bytCase = 2 Then
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
UltimateTest = Array(collectTest(1), collectTest(2))
ElseIf bytCase = 3 Then
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
UltimateTest = Array(EndTime2, SizeUnique)
ElseIf bytCase = 4 Then
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
UltimateTest = EndTime4
ElseIf bytCase = 5 Then
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
UltimateTest = EndTime5
ElseIf bytCase = 6 Then
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
UltimateTest = EndTime6
End If
End Function
<强>摘要强>
从结果表中我们可以看出Sub GetBenchmarks()
Dim myVar, i As Long, TestCases As Variant, j As Long, temp
TestCases = Array(1000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000)
For j = 0 To 11
If j < 6 Then
myVar = UltimateTest(CLng(TestCases(j)), True, True, 1)
ElseIf j < 10 Then
myVar = UltimateTest(CLng(TestCases(j)), False, True, 1)
ElseIf j < 11 Then
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, 0, 0, 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 2)
myVar(7) = temp(0): myVar(8) = temp(1)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
Else
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, "Not Tested", "Not Tested", 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
End If
Cells(4 + j, 6) = TestCases(j)
For i = 1 To 9: Cells(4 + j, 6 + i) = myVar(i - 1): Next i
Cells(4 + j, 17) = myVar(9)
Next j
End Sub
方法对于小于约500,000的情况非常有效,但在此之后,Dictionary
确实开始占主导地位。您会注意到,当顺序无关紧要并且您的数据由正整数组成时,不会与IndexMethod
算法进行比较(它会返回包含1000万个元素的数组中的唯一值。 1秒!!!难以置信!)。下面我分析了在不同情况下首选哪种算法。
案例1
您的数据包含整数(即整数,包括正数和负数):IndexMethod
案例2
您的数据包含少于200000个元素的非整数(即变量,双精度,字符串等):IndexMethod
案例3
您的数据包含超过200000个元素的非整数(即变量,双精度,字符串等):Dictionary Method
如果你必须选择一种算法,在我看来,Collection Method
方法仍然是最好的,因为它只需要几行代码,它是超级通用的,并且它是足够快。
答案 3 :(得分:2)
我不知道VBA中有任何内置功能。最好的方法是使用值作为键的集合,只有在值不存在时才添加它。
答案 4 :(得分:2)
不,没有内置的东西。自己动手:
Scripting.Dictionary
对象For
循环(务必使用LBound()
和UBound()
,而不是从0循环到x!)Exists()
。添加每个数组值(尚不存在)作为字典的键(CStr()
,因为键必须是字符串Scripting.Dictionary
)中,还将数组值本身存储到字典中。Keys()
(或Items()
)将字典的所有值作为新的,现在唯一的数组返回。答案 5 :(得分:1)
不,VBA没有此功能。您可以使用将项目作为键将每个项目添加到集合的技术。由于集合不允许重复键,因此结果是可以根据需要复制到数组的不同值。
您可能还需要更强大的功能。请参阅http://www.cpearson.com/excel/distinctvalues.aspx
上的不同值函数不同的值函数
将返回的VBA函数 a中不同值的数组 范围或输入值数组。
Excel有一些手动方法,例如 高级过滤器,用于获取列表 输入范围中的不同项目。 使用这种方法的缺点是 你必须手动刷新 输入数据更改时的结果。 而且,这些方法只适用于 范围,而不是值的数组,而不是 是功能,不能从中调用 工作表单元格或并入 数组公式。这个页面描述了一个 VBA函数名为DistinctValues 接受范围作为输入 或者数组和返回数据 结果包含一个数组 输入列表中的不同项目。 也就是说,所有的元素 重复删除。的顺序 输入元素被保留。命令 输出数组中的元素是 与输入中的顺序相同 值。可以调用该函数 从a数组输入的范围 工作表(请参阅此页面 有关数组公式的信息),或 来自一个数组公式中的单个 工作表单元格,或来自另一个VB 功能
答案 6 :(得分:0)
收集和词典解决方案都很好用,并且可以用于简短的方法,但是如果你想要速度,请尝试使用更直接的方法:
Function ArrayUnique(ByVal aArrayIn As Variant) As Variant
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' ArrayUnique
' This function removes duplicated values from a single dimension array
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim aArrayOut() As Variant
Dim bFlag As Boolean
Dim vIn As Variant
Dim vOut As Variant
Dim i%, j%, k%
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
致电:
Sub Test()
Dim aReturn As Variant
Dim aArray As Variant
aArray = Array(1, 2, 3, 1, 2, 3, "Test", "Test")
aReturn = ArrayUnique(aArray)
End Sub
对于速度比较,这比字典解决方案快100倍到130倍,比收集速度快大约8000倍到13000倍。
答案 7 :(得分:0)
如果重复数据删除阵列的顺序对您无关紧要,您可以使用我的实用功能:
Function DeDupArray(ia() As String)
Dim newa() As String
ReDim newa(999)
ni = -1
For n = LBound(ia) To UBound(ia)
dup = False
If n <= UBound(ia) Then
For k = n + 1 To UBound(ia)
If ia(k) = ia(n) Then dup = True
Next k
If dup = False And Trim(ia(n)) <> "" Then
ni = ni + 1
newa(ni) = ia(n)
End If
End If
Next n
If ni > -1 Then
ReDim Preserve newa(ni)
Else
ReDim Preserve newa(1)
End If
DeDupArray = newa
End Function
Sub testdedup()
Dim m(5) As String
Dim m2() As String
m(0) = "Horse"
m(1) = "Cow"
m(2) = "Dear"
m(3) = "Horse"
m(4) = "Joke"
m(5) = "Cow"
m2 = DeDupArray(m)
t = ""
For n = LBound(m2) To UBound(m2)
t = t & n & "=" & m2(n) & " "
Next n
MsgBox t
End Sub
从测试函数中,它将产生以下重复数据删除数组:
“0 =亲爱的1 =马2 =笑话3 =牛”
答案 8 :(得分:0)
没有用于从阵列中删除重复项的VBA内置功能,但是您可以使用下一个功能:
Function RemoveDuplicates(MyArray As Variant) As Variant
With CreateObject("scripting.dictionary")
For Each item In MyArray
c00 = .Item(item)
Next
sn = .keys ' the array .keys contains all unique keys
MsgBox Join(.keys, vbLf) ' you can join the array into a string
RemoveDuplicates = .keys ' return an array without duplicates
End With
End Function
答案 9 :(得分:0)
通过 stdVBA,您可以使用:
uniqueValues = stdEnumerator.CreateFromArray(myArray).Unique().AsArray()
注意事项:
您还可以为某些 Unique
/任何实现 Collection
的对象获取 IEnumVARIANT
:
uniqueValues = stdEnumerator.CreateFromIEnumVARIANT(myCollection).Unique().AsCollection()
您还可以通过某个对象的属性获取 Unique
:
uniqueValues = stdEnumerator.CreateFromIEnumVARIANT(ThisWorkbook.Sheets).Unique(stdLambda("$1.range(""A1"").value")).AsCollection()