匹配两个不同表中的部分文本字符串(90%)两列

时间:2016-12-09 11:09:58

标签: excel string vba match partial

我正在尝试将(90%)部分文本字符串从工作表列匹配到另一个工作表列,并将最终结果带到主工作表列。 我发现了一个VBA解决方案,但我遇到了一些问题。 1)它匹配精确的文本 2)找到匹配两个不同表格列的问题。

请帮我解决这个问题。

Sub lookup()
Dim TotalRows As Long
Dim rng As Range
Dim i As Long

'Copy lookup values from sheet1 to sheet3
Sheets("BANK STATEMENT ENTRY").Select
TotalRows = ActiveSheet.UsedRange.Rows.Count
Range("F3:F" & TotalRows).Copy Destination:=Sheets("TEST").Range("A1")

'Go to the destination sheet
Sheets("TEST").Select

For i = 1 To TotalRows
    'Search for the value on sheet2
    Set rng = Sheets("INFO").UsedRange.Find(Cells(i, 1).Value)
    'If it is found put its value on the destination sheet
    If Not rng Is Nothing Then
        Cells(i, 2).Value = rng.Value
    End If
Next
End Sub

1 个答案:

答案 0 :(得分:0)

我已经完成了一个文本挖掘项目,我知道你不能使用这种方法,你必须将字符串分解为子字符串,然后分析它们。这将是一个完整的项目,但你很幸运,因为我为你做了。

让我们简化问题并说你有两个字符串范围,并且你想找到两个组之间的每个相似的字符串。此外,您希望具有最小化匹配对的容差。

假设ABCDE和12BCD00。它们共有B,C,D,BC,CD和BCD。因此,最长的公共子串是BCD,其为3个字符:3 / ABCDE(5)的长度与第一个字符串的60%相似性和3/7 = 43%的相似性。因此,如果您可以在两个范围内的所有字符串中获取所有这些常见子字符串的列表,您可以提供更好的列表来过滤并获得您想要的内容。

我写了一堆函数。要轻松使用它,只需将两组字符串复制并粘贴到一个工作表中,并在同一工作表上生成最终报告,以了解其工作原理。

函数FuzzyFind,找到所有常见的子字符串,并为您提供Group1 / range1中的第一个字符串,group2 / range2中的第二个字符串,公共子字符串以及两个字符串的相似百分比。好的是你可以告诉函数你想要的子串有多小,例如:在前面的例子中,如果你说iMinCommonSubLength = 3,它只会给你BCD,如果你说iMinCommonSubLength = 2它会给你BC,CD和BCD等等。

使用功能Main。我还包括一个测试子。

<强>功能

Sub TestIt()
    Call Main(ActiveSheet.Range("A1:A10"), ActiveSheet.Range("B1:B10"), 4, ActiveSheet.Range("D1"))
End Sub

Sub Main(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer, Optional rngReportUpperLeftCell As Range)
    Dim arr() As Variant
    Dim rngReport As Range

    If rngReport Is Nothing Then Set rngReport = ActiveSheet.Range("A1")

    arr = FuzzyFind(rng1, rng2, iMinCommonSubLength)
    Set rngReport = rngReportUpperLeftCell.Resize(UBound(arr, 1), UBound(arr, 2))

    rngReport.Value = arr
    rngReport.Columns(1).NumberFormat = "@"
    rngReport.Columns(2).NumberFormat = "@"
    rngReport.Columns(3).NumberFormat = "@"
    rngReport.Columns(4).NumberFormat = "0%"
    rngReport.Columns(5).NumberFormat = "0%"
End Sub

Function GetCharacters(str As String) As Variant
    Dim arr() As String
    ReDim arr(Len(str) - 1)
    For i = 1 To Len(str)
        arr(i - 1) = Mid$(UCase(str), i, 1)
    Next
    GetCharacters = arr
End Function


Function GetIterations(iStringLength As Integer, iSubStringLength As Integer) As Integer

    If iStringLength >= iSubStringLength Then
        GetIterations = iStringLength - iSubStringLength + 1
    Else
        GetIterations = 0
    End If
End Function


Function GetSubtrings(str As String, iSubLength As Integer) As Variant
    Dim i As Integer
    Dim count As Integer
    Dim arr() As Variant

    count = GetIterations(Len(str), iSubLength)
    ReDim arr(1 To count)

    For i = 1 To count
        arr(i) = Mid(str, i, iSubLength)
    Next i

    GetSubtrings = arr()
End Function

Function GetLongestCommonSubStrings(str1 As String, str2 As String, iMinCommonSubLeng As Integer)
    Dim i As Integer
    Dim iLongestPossible As Integer
    Dim iShortest  As Integer

    Dim arrSubs() As Variant

    Dim arr1() As Variant
    Dim arr2() As Variant

    ReDim arrSubs(1 To 1)

    'Longest possible common substring length is the smaller string's length
    iLongestPossible = IIf(Len(str1) > Len(str2), Len(str2), Len(str1))

    If iLongestPossible < iMinCommonSubLeng Then
        'MsgBox "Minimum common substring length is larger than the shortest string." & _
        '   " You have to choose a smaller common length", , "Error"
    Else
        'We will try to find the first match of common substrings of two given strings, exit after the first match
        For i = iLongestPossible To iMinCommonSubLeng Step -1
            arr1 = GetSubtrings(str1, i)
            arr2 = GetSubtrings(str2, i)
            ReDim arrSubs(1 To 1)
            arrSubs = GetCommonElement(arr1, arr2)

            If arrSubs(1) <> "" Then Exit For 'if you want JUST THE LONGEST MATCH, comment out this line
        Next i
    End If

    GetLongestCommonSubStrings = arrSubs
End Function

Function GetCommonElement(arr1() As Variant, arr2() As Variant) As Variant
    Dim i As Integer
    Dim j As Integer
    Dim count As Integer
    Dim arr() As Variant

    count = 1
    ReDim arr(1 To count)

    For i = 1 To UBound(arr1)
        For j = 1 To UBound(arr2)
            If arr1(i) = arr2(j) Then
                ReDim Preserve arr(1 To count)
                arr(count) = arr1(i)
                count = count + 1
            End If
        Next j
    Next i

    GetCommonElement = arr
End Function

Function FuzzyFind(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer) As Variant
    Dim count As Integer
    Dim i As Integer
    Dim arrSubs As Variant
    Dim str1 As String
    Dim str2 As String
    Dim cell1 As Range
    Dim cell2 As Range
    Dim rngReport  As Range
    Dim arr() As Variant 'array of all cells that are partially matching, str1, str2, common string, percentage

    count = 1
    ReDim arr(1 To 5, 1 To count)

    For Each cell1 In rng1
        str1 = UCase(CStr(cell1.Value))
        If str1 <> "" Then
            For Each cell2 In rng2
                str2 = UCase(CStr(cell2.Value))
                If str2 <> "" Then
                    ReDim arrSubs(1 To 1)
                    arrSubs = GetLongestCommonSubStrings(str1, str2, iMinCommonSubLength)
                    If arrSubs(1) <> "" Then
                        For i = 1 To UBound(arrSubs)
                            arr(1, count) = cell1.Value
                            arr(2, count) = cell2.Value
                            arr(3, count) = arrSubs(i)
                            arr(4, count) = Len(arrSubs(i)) / Len(str1)
                            arr(5, count) = Len(arrSubs(i)) / Len(str2)
                            count = count + 1
                            ReDim Preserve arr(1 To 5, 1 To count)
                        Next i
                    End If
                End If
            Next cell2
        End If
    Next cell1

    FuzzyFind = TransposeArray(arr)

End Function


Function TransposeArray(arr As Variant) As Variant
   Dim arrTemp() As Variant
   ReDim arrTemp(LBound(arr, 2) To UBound(arr, 2), LBound(arr, 1) To UBound(arr, 1))
   For a = LBound(arr, 2) To UBound(arr, 2)
     For b = LBound(arr, 1) To UBound(arr, 1)
        arrTemp(a, b) = arr(b, a)
     Next b
   Next a
   TransposeArray = arrTemp
End Function    

在生成新报告之前,请不要忘记清除工作表。插入一张桌子并使用其自动过滤器轻松过滤您的东西。

最后但同样重要的是,不要忘记点击复选标记以宣布这是您问题的答案。