excel

时间:2015-10-05 10:55:48

标签: regex excel vba

我有一张带有两列的excel表。第一列是关键短语,第二列是消息。关键短语可能出现在消息列中。我需要知道在消息列中发生了多少次关键短语。

关键短语是一列,消息是第二列。消息列是1个或多个1个关键短语的组合(串联)。我需要找出每条消息包含多少关键短语。还有一些消息带有一些日期和数字。此外,一些消息中包含日期和数字,匹配的关键短语作为当前的日期/数字为(xx-xxx-xxxx)。

e.g。消息是“交易于2014年10月8日结束,因此不允许进一步交易”,关键词是“交易已结束(xx-xxx-xxxx) ”。还有“交易号4238428DDSSD有问题”的消息,关键短语是“交易号xxxxxxxx hass问题”。正则表达式匹配是必需的。

1 个答案:

答案 0 :(得分:2)

您可以选择一些关键字短语,为它们创建正则表达式模式,然后对短语进行编码,以便可以在其上使用Range.Replace method来将相应的RegEx模式掩码替换为关键字短语。

在下文中,我使用 X00000000X XSHORTDATEX XDEALNMBRX 作为关键字中的占位符。这些将被替换为[0-9, - ] {7,8},[0-9, - ] {3} [az] {3} [0-9, - ] {3,5}和[0- 9] {7} [az] {5}。

X00000000X 旨在处理任何类似 1234567 或* 99-11-00 *的内容。 XSHORTDATEX 将处理dd-mmm-yy或dd-mmm-yyyy格式的日期(一旦转换为小写), XDEALNMBRX 将找到类似于的字母数字模式4238428DDSSD

  

此代码要求使用VBE的工具►参考命令将 Microsoft VBScript正则表达式库添加到VBA项目中。

Sub count_strings_inside_strings_rgx()
    Dim rw As Long, lr As Long
    Dim k As Long, p As Long, vKEYs As Variant, vPHRASEs As Variant, vCOUNTs As Variant
    Dim sPATTERN As String, vbaRGX As New RegExp, cMATCHES As MatchCollection

    ReDim vKEYs(0)
    ReDim vPHRASEs(0)

    With Worksheets("Sheet1")   '<~~ set to the correct worksheet name\
        'populate the vKEYs array
        For rw = 2 To .Cells(Rows.Count, 1).End(xlUp).Row
            vKEYs(UBound(vKEYs)) = LCase(.Cells(rw, 1).Value2)
            ReDim Preserve vKEYs(UBound(vKEYs) + 1)
        Next rw
        ReDim Preserve vKEYs(UBound(vKEYs) - 1)

        'populate the vPHRASEs array
        For rw = 2 To .Cells(Rows.Count, 2).End(xlUp).Row
            vPHRASEs(UBound(vPHRASEs)) = LCase(.Cells(rw, 2).Value2)
            ReDim Preserve vPHRASEs(UBound(vPHRASEs) + 1)
        Next rw
        ReDim Preserve vPHRASEs(UBound(vPHRASEs) - 1)
        ReDim vCOUNTs(0 To UBound(vPHRASEs))

        For p = LBound(vPHRASEs) To UBound(vPHRASEs)
            For k = LBound(vKEYs) To UBound(vKEYs)
                sPATTERN = Replace(vKEYs(k), "x00000000x", "[0-9,\-]{7,8}")
                sPATTERN = Replace(sPATTERN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
                sPATTERN = Replace(sPATTERN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
                sPATTERN = Replace(sPATTERN, "xshortwrapdatex", "\([0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}\)")
                With vbaRGX
                    .Global = True
                    .Pattern = sPATTERN
                    Set cMATCHES = .Execute(vPHRASEs(p))
                End With
                vCOUNTs(p) = vCOUNTs(p) + cMATCHES.Count
            Next k
        Next p

        .Cells(2, 3).Resize(UBound(vCOUNTs) + 1, 1) = Application.Transpose(vCOUNTs)

        Call key_in_phrase_helper_rgx(vKEYs, .Range(.Cells(2, 2), .Cells(Rows.Count, 2).End(xlUp)))

    End With

    Set cMATCHES = Nothing
    Set vbaRGX = Nothing

End Sub

Sub key_in_phrase_helper_rgx(vKYs As Variant, rPHRSs As Range)
    Dim c As Long, m As Long, p As Long, r As Long, v As Long, sPTTRN As String
    Dim vbaRGX As New RegExp, cMATCHES As MatchCollection

    With rPHRSs
        For r = 1 To rPHRSs.Rows.Count
            With .Cells(r, 1)
                .ClearFormats
                For v = LBound(vKYs) To UBound(vKYs)
                    sPTTRN = Replace(vKYs(v), "x00000000x", "[0-9,\-]{7,8}")
                    sPTTRN = Replace(sPTTRN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
                    sPTTRN = Replace(sPTTRN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
                    sPTTRN = Replace(sPTTRN, "xshortwrapdatex", "\([0-9,\-]{2,3}[a-z]{3}[0-9,\-]{3,5}\)")
                    c = 5 + CBool(vKYs(v) <> sPTTRN) * 2
                    Debug.Print sPTTRN
                    With vbaRGX
                        .Global = True
                        .Pattern = sPTTRN
                    End With
                    Set cMATCHES = vbaRGX.Execute(LCase(.Value2))
                    For m = 0 To cMATCHES.Count - 1
                        p = 0
                        Do While CBool(InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare))
                            p = InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare)
                            'Debug.Print vKYs(v)
                            With .Characters(Start:=p, Length:=Len(cMATCHES.Item(m))).Font
                                .Bold = True
                                .ColorIndex = c
                            End With
                        Loop
                    Next m
                Next v
            End With
        Next r
    End With

    Set cMATCHES = Nothing
    Set vbaRGX = Nothing

End Sub

在下面的样本结果图中,staight位置项以粗体蓝色标注,RegEx模式匹配以粗体红色标注。

regex keyword from phrases

随意修改和附加其他关键字,短语和RegEx模式。