我有一个由大约50,000行组成的数据集,每行(或单元格)的值都用逗号分隔。
item 1, item 2, item 1, item 1, item3, item 2, item 4, item3
目标输出只是
item 1, item 2, item3, item 4
我可以使用excel,open office calc,notepad ++或任何其他免费提供的程序(我发现了一个javascript解决方案,但它只是一个单独的字符串,试图运行它50,000次要么不起作用,要么会采取行动比我更长,我不知道足够的JS来调整它)
有关如何执行此操作的任何建议?
编辑< 以注意某些项目将包含空格>
答案 0 :(得分:4)
应该让你入门。关闭屏幕更新和计算以获得更好的性能...
Sub Tester()
Dim dict As Object
Dim arrItems, c As Range, y As Long
Dim val
Set dict = CreateObject("scripting.dictionary")
For Each c In ActiveSheet.Range("A1:A100").Cells
arrItems = Split(c.Value, ",")
dict.RemoveAll
For y = LBound(arrItems) To UBound(arrItems)
val = Trim(arrItems(y))
If Not dict.exists(val) Then dict.Add val, 1
Next y
c.Offset(0, 1).Value = Join(ArraySort(dict.keys), ",")
Next c
End Sub
对键进行排序:
Function ArraySort(MyArray As Variant)
Dim First As Integer
Dim Last As Integer
Dim i As Integer
Dim j As Integer
Dim Temp
First = LBound(MyArray)
Last = UBound(MyArray)
For i = First To Last - 1
For j = i + 1 To Last
If MyArray(i) > MyArray(j) Then
Temp = MyArray(j)
MyArray(j) = MyArray(i)
MyArray(i) = Temp
End If
Next j
Next i
ArraySort = MyArray
End Function