在Excel中检查重复的子字符串

时间:2015-11-09 22:45:36

标签: arrays excel vba duplicates compare

我试图找到一种方法来比较每个单元格中的第一个重要单词与下一个单元格中的第一个重要单词,如果第一个重要单词匹配,则删除第二个条目。例如,起始数据可能如下所示:

  • 通用电气

    通用电气公司

    通用电气公司

    微软

    Microsoft Corporation

    Microsoft服务器

    任天堂

    任天堂企业

结果应该看起来像这样:

  • 通用电气

    微软

    任天堂

到目前为止,我已经设置了遍历数​​据列的代码:

Sub CompanyNameConsolidate()
    Dim companyName As String
    Dim companyArray() As String
    Dim companyName2 As String
    Dim companyArray2() As String
    Dim totalArray() As String
    Dim wordCount As Integer
    Dim i As Integer
    Dim r As Range

With Sheets("Unassigned")

        Range("B1").Select

        Do Until IsEmpty(ActiveCell)

            companyName = Range("B" & ActiveCell.Row).Text
            companyName2 = ActiveCell.Offset(1, 0).Text
            companyArray = Split(companyName, " ")
            companyArray2 = Split(companyName2, " ")

            wordCount = UBound(companyArray) - LBound(companyArray)

            For i = 0 To wordCount

               If companyArray(i) = companyArray2(i) Then                 

        [*********HELP**********]

            Next

            ActiveCell.Offset(1, 0).Select
        Loop

    End With

End Sub

基本上,上面的代码将每个单元格中的子串与下一个单元格中的子串进行比较。不幸的是,就我而言。

棘手的是,一些公司名称可以有两个单词(通用电气),其他公司名称只能有一个单词(微软)。

您可以假设列表将按字母顺序排序,因此最短的名称(我想保留的名称)将始终位于顶部。

我有超过16,000个条目要经历和修复,所以我绝对必须有一个自动化的方法来做到这一点!

2 个答案:

答案 0 :(得分:1)

找到一个公司名称的常见恶魔,根据你的例子,这似乎只是删除了最后一个单词,如果它超过1个单词。

Dim listOfCompanies As New Collection
Dim companyName As String
Dim companyArray As Variant
Dim item As Variant
Dim i as Integer, j As Integer

'The 2 denotes column B, where i denotes the row
'You can change this outter loop to your specific needs, this one just processes the first column B1, to when it encounters a blank row
while(ThisWorkbook.Worksheets("Unassigned").Cells(i, 2).Value <> "")
    companyName = ThisWorkbook.Worksheets("Unassigned").Cells(i, 2).Value
    companyArray = Split(companyName, " ")

    companyName = ""
    'This truncates the last word off
    for j = 0 to UBound(companyArray) - 1
        companyName = companyName + companyArray(j) + " "
    next j

    'Trim off the last space character
    companyName = Trim(companyName)

    'Now Add your companyName string to a Dictionary Object
    'VBA will throw an error if a duplicate gets added, but this is okay and we can continue processing
    On Error Resume Next
    listOfCompanies.Add(companyName)
    On Error Goto 0 'This resets the handler in case an error occurs somewhere else unexpectedly

    i = i + 1
wend

'Now we can do a ForEach and spit out the entire 'unique list'
For Each item in listOfCompanies
    'Your code here
Next item

答案 1 :(得分:1)

使用行删除的第一个变体:

Sub test()
    Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
    Dim x&, cl As Range, DataRange As Range, k1, k2
    Dic.comparemode = vbTextCompare
    With Sheets("Unassigned")
        Set DataRange = .[B1].Resize(.Cells(Rows.Count, "B").End(xlUp).Row, 2)
        x = 1
        For Each cl In DataRange
            If cl.Value <> "" Then
                Dic.Add x, cl.Value
                x = x + 1
            End If
        Next cl
        For Each k1 In Dic
            For Each k2 In Dic
            If IsNumeric(k1) And IsNumeric(k2) Then
                If Dic(k2) Like Dic(k1) + "*" And k2 > k1 Then
                    Dic.Remove (k2)
                End If
                If Not Dic.exists(Dic(k1)) Then Dic.Add Dic(k1), Nothing
            End If
        Next k2, k1
        x = Split(DataRange.Address, "$")(4)
        While x <> 0
            If Not Dic.exists(.Cells(x, "B").Value) Then .Rows(x).Delete
            x = x - 1
        Wend
    End With
End Sub

使用Workbook.Add的第二个变体:

Sub test2()
    Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary")
    Dim x&, cl As Range, DataRange As Range, k1, k2
    Dic.comparemode = vbTextCompare
    With Sheets("Unassigned")
        Set DataRange = .[B1].Resize(.Cells(Rows.Count, "B").End(xlUp).Row, 2)
        x = 1
        For Each cl In DataRange
            If cl.Value <> "" Then
                Dic.Add x, cl.Value
                x = x + 1
            End If
        Next cl
        For Each k1 In Dic
            For Each k2 In Dic
                If Dic(k2) Like Dic(k1) + "*" And k2 > k1 Then
                    Dic.Remove (k2)
                End If
        Next k2, k1
    End With
    Workbooks.Add
    x = 1
    For Each k1 In Dic
        Cells(x, 2) = Dic(k1)
        x = x + 1
    Next k1
End Sub

测试两种变体

之前: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~&gt; 之后: enter image description here enter image description here