从水平行或字符串中删除重复值

时间:2012-06-27 18:43:19

标签: excel vba duplicates

我有一个由大约50,000行组成的数据集,每行(或单元格)的值都用逗号分隔。

item 1, item 2, item 1, item 1, item3, item 2, item 4, item3

目标输出只是

item 1, item 2, item3, item 4

我可以使用excel,open office calc,notepad ++或任何其他免费提供的程序(我发现了一个javascript解决方案,但它只是一个单独的字符串,试图运行它50,000次要么不起作用,要么会采取行动比我更长,我不知道足够的JS来调整它)

有关如何执行此操作的任何建议?

编辑

< 以注意某些项目将包含空格>

1 个答案:

答案 0 :(得分:4)

应该让你入门。关闭屏幕更新和计算以获得更好的性能...

Sub Tester()

    Dim dict As Object
    Dim arrItems, c As Range, y As Long
    Dim val

    Set dict = CreateObject("scripting.dictionary")

    For Each c In ActiveSheet.Range("A1:A100").Cells

        arrItems = Split(c.Value, ",")
        dict.RemoveAll
        For y = LBound(arrItems) To UBound(arrItems)
            val = Trim(arrItems(y))
            If Not dict.exists(val) Then dict.Add val, 1
        Next y

        c.Offset(0, 1).Value = Join(ArraySort(dict.keys), ",")

    Next c

End Sub

对键进行排序:

Function ArraySort(MyArray As Variant)

    Dim First           As Integer
    Dim Last            As Integer
    Dim i               As Integer
    Dim j               As Integer
    Dim Temp

    First = LBound(MyArray)
    Last = UBound(MyArray)
    For i = First To Last - 1
        For j = i + 1 To Last
            If MyArray(i) > MyArray(j) Then
                Temp = MyArray(j)
                MyArray(j) = MyArray(i)
                MyArray(i) = Temp
            End If
        Next j
    Next i
    ArraySort = MyArray

End Function