查找模式:识别一组单元格共有的字符串部分

时间:2018-06-06 12:39:21

标签: excel excel-vba vba

我不知道如何搜索这个或如何在没有示例的情况下解释。

我正在寻找一个excel函数来比较单元格字符串并识别它们共有的部分。

条件

  • 比较2个或更多单元格。
  • 只要共享2个单元格,就会识别出公共字符串。 *(即:如果比较超过2,它足以让2个单元格具有该字符串。并非所有比较的单元格都需要它。)*
  • 该字符串至少包含3个或更多字符,以避免单个字符和标记对。

示例

----------------------------------------------------------------------
|  Pattern  | Page URL 1           | Page URL 2     | Page URL 3     |
----------------------------------------------------------------------
|    test   | example.net/test/    | www.test.com   | www.notest.com |  
----------------------------------------------------------------------
|   q=age   | another.com?q=age    | test.com/q=age | test.com/q=lol |
----------------------------------------------------------------------

现在可能很明显,但我想要实现/分析的是,是否存在大型网址共有的字符串模式。

(原谅我试图画桌子的可怜尝试)

2 个答案:

答案 0 :(得分:1)

这并没有完全回答这个问题,但我认为它会为您提供获得它所需要的东西。试试看。将以下代码放在一个新的moudule中:

Public Sub FindStrings()
    Dim rng1 As Excel.Range
    Dim rng2 As Excel.Range

    Set rng1 = ActiveSheet.Range("A1")
    Set rng2 = ActiveSheet.Range("A2")

    Dim i As Integer
    Dim j As Integer
    Dim searchVal As String
    For i = 3 To Len(rng2)
        For j = 1 To Len(rng1)
            searchVal = Mid(rng1, j, i)
            If Len(searchVal) < i Then Exit For
            If InStr(1, rng2, searchVal) Then Debug.Print searchVal
        Next j
    Next i
End Sub

在单元格A1中输入 example.net/test
在单元格A2中输入 www.test.com

<强>结果

tes
est
test

<强>更新

我更新了代码以搜索至少4个字符而不是3个字符(如上所述)。此外,我猜你不会想要返回www..com等字符串,也不想要/.字符串。所以代码也将这些内容拉出来。此外,它会比较每个列组合。

Option Explicit
Public Sub CompareStrings()
    Dim Arr As Variant
    Dim i As Integer
    Dim j As Integer
    Dim StartRange As Excel.Range
    Dim SearchRange As Excel.Range
    Dim Counter As Integer
    Dim ComparableRange As Variant
    Dim Comparable As Integer
    Dim Compared As Integer
    Dim SearchVal As String

    Set StartRange = ActiveSheet.Range("A1")

    Counter = 0
    For Each ComparableRange In ActiveSheet.Range("A1:A2")
    Set SearchRange = Range(StartRange.Offset(Counter), Cells(StartRange.Offset(Counter).Row, Columns.Count).End(xlToLeft))
    Arr = Application.Transpose(Application.Transpose(SearchRange.Value))
    Debug.Print "Row " & SearchRange.Row & ":"
        For j = LBound(Arr) To UBound(Arr)
            For i = j + 1 To UBound(Arr)
                For Comparable = 4 To Len(Arr(j))
                    For Compared = 1 To Len(Arr(i))
                        SearchVal = Mid(Arr(j), Compared, Comparable)
                        If InStr(1, SearchVal, ".") = 0 Then
                            If InStr(1, SearchVal, "/") = 0 Then
                                If Len(SearchVal) < Comparable Then Exit For
                                If InStr(1, Arr(i), SearchVal) > 0 Then Debug.Print vbTab & SearchVal
                            End If
                        End If
                    Next Compared
                Next Comparable
            Next i
        Next j
        Counter = Counter + 1
    Next ComparableRange    
End Sub

test.com/q=ageanother.com?q=age进行比较时,您仍会得到以下结果:

q=ag
=age 
q=age 

......虽然我怀疑你只想要第三个。匹配字符串越长,获得的结果就越多。最后的结果是你可能想要的结果。

答案 1 :(得分:1)

将以下代码复制到模块中。阅读CommonString顶部的评论以供使用。

Option Explicit

Public Function CommonString(rng As Range, iMinLen As Integer, Optional strDelimiter As String = ",") As String
    'Finds the maximum number of cells (iMax) in "rng" that have a common substring of length at least "iMinLen".
    'The function returns a string with the format "iMax: substring1,substring2,substring3..."
    ' where substring1, substring2, etc. are unique substrings found in exactly iMax cells.
    'The output does not include any substrings of the unique substrings.
    'The delimter between substrings can be specified by the optional parameter "strDelimiter".
    'If no common substrings of length at least "iMinLen" are found, "CommonString" will return an empty string.
    Dim blnRemove() As Boolean
    Dim dicSubStrings As Object 'records the number of times substrings are found in pairwise string comparisons
    Dim iCandidates As Integer
    Dim iCol As Integer
    Dim iCurrCommon As Integer
    Dim iCurrLen As Integer
    Dim iMax As Integer
    Dim iMaxCommon As Integer
    Dim iNumStrings As Integer
    Dim iOutCount As Integer
    Dim iRow As Integer
    Dim iString1 As Integer
    Dim iString2 As Integer
    Dim iSubStr1 As Integer
    Dim iSubStr2 As Integer
    Dim lngSumLen As Long
    Dim str1D() As String
    Dim strCandidates() As String
    Dim strOut() As String
    Dim strSim() As String
    Dim strSub As String
    Dim vKey As Variant
    Dim vStringsIn() As Variant

    Set dicSubStrings = CreateObject("Scripting.Dictionary")
    vStringsIn = rng.Value
    iNumStrings = Application.CountA(rng)
    ReDim str1D(1 To iNumStrings)
    ' pull the strings into a 1-D array
    For iRow = 1 To UBound(vStringsIn, 1)
        For iCol = 1 To UBound(vStringsIn, 2)
            iCurrLen = Len(vStringsIn(iRow, iCol))

            If iCurrLen > 0 Then
                iString1 = iString1 + 1
                str1D(iString1) = vStringsIn(iRow, iCol)
                lngSumLen = lngSumLen + iCurrLen
            End If
        Next iCol
    Next iRow
    'initialize the array that will hold the substrings to output
    ReDim strOut(1 To lngSumLen - iNumStrings * (iMinLen - 1))
    'find common substrings from all pairwise combination of strings
    For iString1 = 1 To iNumStrings - 1
        For iString2 = iString1 + 1 To iNumStrings
            strSim = Sim2Strings(str1D(iString1), str1D(iString2), iMinLen)
            'loop through all common substrings
            For iSubStr1 = 1 To UBound(strSim)
                If dicSubStrings.Exists(strSim(iSubStr1)) Then
                    iCurrCommon = dicSubStrings(strSim(iSubStr1)) + 1
                    dicSubStrings(strSim(iSubStr1)) = iCurrCommon
                    If iCurrCommon > iMaxCommon Then iMaxCommon = iCurrCommon
                Else    'add common substrings to the "dicSubStrings" dictionary
                    dicSubStrings.Add strSim(iSubStr1), 1
                    If iMaxCommon = 0 Then iMaxCommon = 1
                End If
            Next iSubStr1
        Next iString2
    Next iString1

    If dicSubStrings.Count = 0 Then Exit Function
    ReDim strCandidates(1 To dicSubStrings.Count)
    'add the candidate substrings to the "strCandidates" array
    'candidate substrings are those found in exactly "iMaxCommon" pairwise comparisons
    For Each vKey In dicSubStrings.keys
        If dicSubStrings(vKey) = iMaxCommon Then
            iCandidates = iCandidates + 1
            strCandidates(iCandidates) = CStr(vKey)
        End If
    Next vKey

    ReDim blnRemove(1 To iCandidates)
    iOutCount = iCandidates
    'keep only the candidate substrings that are not a substring within another candidate substring
    For iSubStr1 = 1 To iCandidates - 1
        If Not blnRemove(iSubStr1) Then
            For iSubStr2 = 1 To iCandidates - 1
                If Not blnRemove(iSubStr2) Then
                    If Len(strCandidates(iSubStr1)) <> Len(strCandidates(iSubStr2)) Then
                        If Len(strCandidates(iSubStr1)) > Len(strCandidates(iSubStr2)) Then
                            If InStr(strCandidates(iSubStr1), strCandidates(iSubStr2)) > 0 Then
                                blnRemove(iSubStr2) = True
                                iOutCount = iOutCount - 1
                            End If
                        Else
                            If InStr(strCandidates(iSubStr2), strCandidates(iSubStr1)) > 0 Then
                                blnRemove(iSubStr1) = True
                                iOutCount = iOutCount - 1
                            End If
                        End If
                    End If
                End If
            Next iSubStr2
        End If
    Next iSubStr1

    ReDim strOut(1 To iOutCount)
    iOutCount = 0
    'add the successful candidates to "strOut"
    For iSubStr1 = 1 To iCandidates
        If Not blnRemove(iSubStr1) Then
            iOutCount = iOutCount + 1
            strOut(iOutCount) = strCandidates(iSubStr1)
        End If
    Next iSubStr1
    'convert "iMaxCommon" (pairwise counts) to number of cells (iMax) by solving the formula:
    '(iMax ^ 2 - iMax) / 2 = iMaxCommon
    iMax = ((8 * iMaxCommon + 1) ^ 0.5 + 1) / 2
    CommonString = iMax & ": " & Join(strOut, strDelimiter)
End Function

Private Function Sim2Strings(str1 As String, str2 As String, iMinLen As Integer) As String()
    'Returns a list of unique substrings common to both "str1" and "str2" that
    ' have a length of at least "iMinLen".
    Dim dicInList As Object
    Dim iCharFrom As Integer
    Dim iLen1 As Integer
    Dim iSearchLen As Integer
    Dim iSubStr As Integer
    Dim strCurr As String
    Dim strList() As String
    Dim vKey As Variant

    iLen1 = Len(str1)
    Set dicInList = CreateObject("Scripting.Dictionary")
    'add common substrings to the "dicInList" dictionary
    For iCharFrom = 1 To iLen1 - iMinLen + 1
        For iSearchLen = iMinLen To iLen1 - iCharFrom + 1
            strCurr = Mid(str1, iCharFrom, iSearchLen)

            If InStr(str2, strCurr) = 0 Then
                Exit For
            Else
                If Not dicInList.Exists(strCurr) Then
                    dicInList.Add strCurr, 0
                End If
            End If
        Next iSearchLen
    Next iCharFrom

    If dicInList.Count = 0 Then
        ReDim strList(0)
    Else
        ReDim Preserve strList(1 To dicInList.Count)
        'output the keys in the "dicInList" dictionary to the "strList" array
        For Each vKey In dicInList.keys
            iSubStr = iSubStr + 1
            strList(iSubStr) = vKey
        Next vKey
    End If

    Sim2Strings = strList
End Function