Speed up sorting array and binary search

时间:2016-02-12 19:21:28

标签: arrays excel vba sorting binary

I am trying to compare two worksheets, both have the column "EMAIL" which I want to compare. One column contains emails which were sent out and another contains emails where people actually clicked on them.

I have defined the two columns as arrays, EmailList and ClickthroughsList, and for each position in the EmailList array I have an If loop to cycle through ClickthroughList till a match is found:

            For i = 1 To lRow
                EmailList(i) = ThisWorkbook.Sheets(2).Cells(i, col).Value
                Sheets(7).Cells(i, 1).Value = EmailList(i)

                If i = 1 Then
                Sheets(7).Cells(i, 2).Value = "Sent"
                Sheets(7).Cells(i, 5).Value = "Unique Clickthroughs"

                Else
                Sheets(7).Cells(i, 2).Value = 1

                    For bi = 1 To bRow
                    If EmailList(i) = ClickthroughsList(bi) Then
                    Sheets(7).Cells(i, 5).Value = 1
                    End If
                    Next bi

                End If

            Next i

If a match is found I would like it to indicate "1" on Sheets(7), as defined above. It means that that particular email has been clicked through.

When the array is 700k+ rows or above, for two arrays, this code takes a couple hours to run.

It has been suggested I could do a sort and then a binary search. However, I still need the position of the array in EmailList so that I can put a "1" beside it (and that row will contain more info particular to that email).

What comes to mind is to define a new sorted array, while keeping the old one, and somehow when I find a match in the new sorted array, match it back to the old one to know the position?

            Option Explicit
            Private wsSent As Worksheet
            Private aCell As Range, Rng As Range
            Private col As Long, lRow As Long
            Private colName As String
            Private i As Long
            Private EmailList() As String

            Private wsClickthroughs As Worksheet
            Private bCell As Range, bRng As Range
            Private bcol As Long, bRow As Long
            Private bcolName As String
            Private bi As Long
            Private ClickthroughsList() As String

            Sub EmailArrayClickthroughs()

            Application.ScreenUpdating = False

            GetClickthroughsArray
                '~~> Change this to the relevant sheet
                Set wsSent = ThisWorkbook.Sheets(2)

                With wsSent
                    Set aCell = .Range("A1:DZ1").Find(What:="EMAIL", LookIn:=xlValues, LookAt:=xlWhole, _
                                MatchCase:=False, SearchFormat:=False)

                    '~~> If Found
                    If Not aCell Is Nothing Then
                        col = aCell.Column
                        colName = Split(.Cells(, col).Address, "$")(1)
                        lRow = .Range(colName & .Rows.Count).End(xlUp).Row
                    '~~> This is your range
                        Set Rng = .Range(colName & "2:" & colName & lRow)

                    Else
                        MsgBox "EMAIL (Clickthroughs) Not Found"
                    End If
                End With

            ReDim EmailList(lRow)

            For i = 1 To lRow
                EmailList(i) = ThisWorkbook.Sheets(2).Cells(i, col).Value
                Sheets(7).Cells(i, 1).Value = EmailList(i)

                If i = 1 Then
                Sheets(7).Cells(i, 2).Value = "Sent"
                Sheets(7).Cells(i, 5).Value = "Unique Clickthroughs"

                Else
                Sheets(7).Cells(i, 2).Value = 1

                    For bi = 1 To bRow
                    If EmailList(i) = ClickthroughsList(bi) Then
                    Sheets(7).Cells(i, 5).Value = 1
                    End If
                    Next bi

                End If

            Next i

            Debug.Print Rng.Address

            Application.ScreenUpdating = True

            End Sub


            Sub GetClickthroughsArray()

                '~~> Change this to the relevant sheet
                Set wsClickthroughs = ThisWorkbook.Sheets(5)

                With wsClickthroughs
                    Set bCell = .Range("A1:DZ1").Find(What:="EMAIL", LookIn:=xlValues, LookAt:=xlWhole, _
                                MatchCase:=False, SearchFormat:=False)

                    '~~> If Found
                    If Not bCell Is Nothing Then
                        bcol = bCell.Column
                        bcolName = Split(.Cells(, bcol).Address, "$")(1)
                        bRow = .Range(bcolName & .Rows.Count).End(xlUp).Row
             '~~> This is your range
                        Set bRng = .Range(bcolName & "2:" & bcolName & bRow)
                    Else
                        MsgBox "EMAIL (opens) Not Found"
                    End If
                End With

            Debug.Print bRng.Address

            ReDim ClickthroughsList(bRow)

            For bi = 1 To bRow
                ClickthroughsList(bi) = ThisWorkbook.Sheets(5).Cells(bi, bcol).Value
            Next bi

            End Sub

3 个答案:

答案 0 :(得分:0)

If I understand what you're trying to do, you don't need to use VBA.

Just use the MATCH worksheet function. In the column that you want to see if the value is in the "ClickThrough" sheet, put

 =IF(ISNA(MATCH(colEmail, YourSheet!colEmail1:colEmail1000, 0)), "0", "1")

YourSheet needs to be the sheet name with the clicked through emails. colEmail needs to be whatever column has the "Email" that you are trying to find and 1000 is the number of records. You could instead just put "A:A" to get the entire column.

At the end it will look something like this:

= IF(IF(ISNA(MATCH(A2, mySheet!$A$2:$A$1000, 0)), "0", "1")

Then just copy it down the entire column.

答案 1 :(得分:0)

Ok this is a fairly simple fix. The worst WORST thing you can do in VBA for performance is loop through cells. To fix this create a variable range and assign the list to it. Assigning both lists to a variable in a single command should take less than a second even with 700K rows. Then when comparing them you will like compare in less that 30 seconds. If you need to write back to the rows depending on what the result of the comparison is, make an equally sized variable, write to that variable whatever you want to write. Then after everything is done dump that variable onto the spread sheet, through a loop or a range paste. I would guess you can get this down to under 2 min easily. Make it faster?

答案 2 :(得分:0)

Here's one approach for matching two lists:

explicit Nodes(std::initializer_list<T> ini) :
    m_numParams(ini.size())
{
    for (auto&& e : ini) {
        m_vNodes.push_back(std::make_shared<T>(e));
    }
}

...and this one is even faster (<2sec to check 700k against 7k items)

Sub TestMatch()

    Dim rng1 As Range, rng2 As Range, f

    Set rng1 = Range("C3:C22")
    Set rng2 = Range("F3:F19")

    f = "=IFERROR(MATCH(" & rng1.Address(False, False) & _
      "," & rng2.Address(True, True) & ",0),0)>0"

    Debug.Print f

    rng1.Offset(0, 1).Value = ActiveSheet.Evaluate(f)

End Sub