Excel /有效识别(精确)特定表列中35k行的字符串出现次数

时间:2017-01-23 09:59:43

标签: excel excel-formula

我有一个包含35000行的表,每行应该有一个唯一的(字母数字)ID,但我需要检查一下。使用countif需要一个年龄来计算,什么是最有效的(最少的计算时间)。

[ID] | [Occurences]
[A7A8S8D9] | [<formula>]
[F6F7F8F9] | [<formula>]

1 个答案:

答案 0 :(得分:0)

如果你不介意VBA解决方案,这将在所有重复ID旁边放置'Duplicate',除了第一个。

我在使用=RANDBETWEEN(1, 1000)创建的35,000个数字ID上测试了它,它在671毫秒内运行。如果将范围传递给数组,它可能会运行得更快。

'Remove line to remove timing.
Private Declare Function GetTickCount Lib "kernel32" () As Long

'Will not mark first occurrence as duplicate.
Public Sub CheckForDuplicates()

    Dim rng As Range
    Dim rCell As Range
    Dim dict As Object

    'Remove next two lines to remove timing.
    Dim TC As Long
    TC = GetTickCount

    'Update sheet name as required.
    With ThisWorkbook.Worksheets("Sheet1")
        'Defines range as A2 to last row containing data (providing there's no empty cells in col A).
        '-1 at end of line as we're starting on row 2.
        Set rng = .Range("A2").Resize(.Cells(Rows.Count, 1).End(xlUp).Row - 1)
    End With
    Set dict = CreateObject("Scripting.Dictionary")

    With dict
        For Each rCell In rng
            If .Exists(rCell.Value) Then
                rCell.Offset(, 1) = "Duplicate"
            Else
               .Add rCell.Value, rCell.Value
            End If
        Next rCell
    End With

    'Remove line to remove timing.
    MsgBox GetTickCount - TC & "ms elapsed."

End Sub