我有一张带有两列的excel表。第一列是关键短语,第二列是消息。关键短语可能出现在消息列中。我需要知道在消息列中发生了多少次关键短语。
关键短语是一列,消息是第二列。消息列是1个或多个1个关键短语的组合(串联)。我需要找出每条消息包含多少关键短语。还有一些消息带有一些日期和数字。此外,一些消息中包含日期和数字,匹配的关键短语作为当前的日期/数字为(xx-xxx-xxxx)。
e.g。消息是“交易于2014年10月8日结束,因此不允许进一步交易”,关键词是“交易已结束(xx-xxx-xxxx) ”。还有“交易号4238428DDSSD有问题”的消息,关键短语是“交易号xxxxxxxx hass问题”。正则表达式匹配是必需的。
答案 0 :(得分:2)
您可以选择一些关键字短语,为它们创建正则表达式模式,然后对短语进行编码,以便可以在其上使用Range.Replace method来将相应的RegEx模式掩码替换为关键字短语。
在下文中,我使用 X00000000X , XSHORTDATEX 和 XDEALNMBRX 作为关键字中的占位符。这些将被替换为[0-9, - ] {7,8},[0-9, - ] {3} [az] {3} [0-9, - ] {3,5}和[0- 9] {7} [az] {5}。
X00000000X 旨在处理任何类似 1234567 或* 99-11-00 *的内容。 XSHORTDATEX 将处理dd-mmm-yy或dd-mmm-yyyy格式的日期(一旦转换为小写), XDEALNMBRX 将找到类似于的字母数字模式4238428DDSSD
此代码要求使用VBE的工具►参考命令将 Microsoft VBScript正则表达式库添加到VBA项目中。
Sub count_strings_inside_strings_rgx()
Dim rw As Long, lr As Long
Dim k As Long, p As Long, vKEYs As Variant, vPHRASEs As Variant, vCOUNTs As Variant
Dim sPATTERN As String, vbaRGX As New RegExp, cMATCHES As MatchCollection
ReDim vKEYs(0)
ReDim vPHRASEs(0)
With Worksheets("Sheet1") '<~~ set to the correct worksheet name\
'populate the vKEYs array
For rw = 2 To .Cells(Rows.Count, 1).End(xlUp).Row
vKEYs(UBound(vKEYs)) = LCase(.Cells(rw, 1).Value2)
ReDim Preserve vKEYs(UBound(vKEYs) + 1)
Next rw
ReDim Preserve vKEYs(UBound(vKEYs) - 1)
'populate the vPHRASEs array
For rw = 2 To .Cells(Rows.Count, 2).End(xlUp).Row
vPHRASEs(UBound(vPHRASEs)) = LCase(.Cells(rw, 2).Value2)
ReDim Preserve vPHRASEs(UBound(vPHRASEs) + 1)
Next rw
ReDim Preserve vPHRASEs(UBound(vPHRASEs) - 1)
ReDim vCOUNTs(0 To UBound(vPHRASEs))
For p = LBound(vPHRASEs) To UBound(vPHRASEs)
For k = LBound(vKEYs) To UBound(vKEYs)
sPATTERN = Replace(vKEYs(k), "x00000000x", "[0-9,\-]{7,8}")
sPATTERN = Replace(sPATTERN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
sPATTERN = Replace(sPATTERN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
sPATTERN = Replace(sPATTERN, "xshortwrapdatex", "\([0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}\)")
With vbaRGX
.Global = True
.Pattern = sPATTERN
Set cMATCHES = .Execute(vPHRASEs(p))
End With
vCOUNTs(p) = vCOUNTs(p) + cMATCHES.Count
Next k
Next p
.Cells(2, 3).Resize(UBound(vCOUNTs) + 1, 1) = Application.Transpose(vCOUNTs)
Call key_in_phrase_helper_rgx(vKEYs, .Range(.Cells(2, 2), .Cells(Rows.Count, 2).End(xlUp)))
End With
Set cMATCHES = Nothing
Set vbaRGX = Nothing
End Sub
Sub key_in_phrase_helper_rgx(vKYs As Variant, rPHRSs As Range)
Dim c As Long, m As Long, p As Long, r As Long, v As Long, sPTTRN As String
Dim vbaRGX As New RegExp, cMATCHES As MatchCollection
With rPHRSs
For r = 1 To rPHRSs.Rows.Count
With .Cells(r, 1)
.ClearFormats
For v = LBound(vKYs) To UBound(vKYs)
sPTTRN = Replace(vKYs(v), "x00000000x", "[0-9,\-]{7,8}")
sPTTRN = Replace(sPTTRN, "xshortdatex", "[0-9,\-]{3}[a-z]{3}[0-9,\-]{3,5}")
sPTTRN = Replace(sPTTRN, "xdealnmbrx", "[0-9]{7}[a-z]{5}")
sPTTRN = Replace(sPTTRN, "xshortwrapdatex", "\([0-9,\-]{2,3}[a-z]{3}[0-9,\-]{3,5}\)")
c = 5 + CBool(vKYs(v) <> sPTTRN) * 2
Debug.Print sPTTRN
With vbaRGX
.Global = True
.Pattern = sPTTRN
End With
Set cMATCHES = vbaRGX.Execute(LCase(.Value2))
For m = 0 To cMATCHES.Count - 1
p = 0
Do While CBool(InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare))
p = InStr(p + 1, .Value2, cMATCHES.Item(m), vbTextCompare)
'Debug.Print vKYs(v)
With .Characters(Start:=p, Length:=Len(cMATCHES.Item(m))).Font
.Bold = True
.ColorIndex = c
End With
Loop
Next m
Next v
End With
Next r
End With
Set cMATCHES = Nothing
Set vbaRGX = Nothing
End Sub
在下面的样本结果图中,staight位置项以粗体蓝色标注,RegEx模式匹配以粗体红色标注。
随意修改和附加其他关键字,短语和RegEx模式。