正则表达式匹配子字符串,但不包含单词(单词边界问题)

时间:2018-05-11 16:18:32

标签: regex

我有100,000个文件(主要是办公室类型的文件)。我使用Excel VBA检查包含单词" list"的所有文件名,但试图避免误报(例如"专家")。

答案提供了" Regex用于匹配子字符串,但不包含word"除了我的文件名没有整齐的字边界外,它几乎是所需要的(\b(?!String)\w*ring\w*\b)。

当前模式\b(?!specialist)\w*list\w*\b正确地忽略了某些变体(3 Specialist6-specialistSpecialists等)。是否可以修改模式,以便正确地删除以下变体:1Specialist2_specialistXspecialists?如果是这样,有人可以指出我正确的方向吗?< / p>

非常感谢任何帮助/建议, 中号

这是我一直在使用的递归子程序(道歉格式不佳):

Sub RecursiveFolderPATTERN(objFolder As Scripting.Folder, _IncludeSubfolders As Boolean)

'Declare the variables
Dim objFile As Object
Dim objSubFolder As Scripting.Folder
Dim NextRow As Long

Dim objRegExp As Object
Set objRegExp = CreateObject("VBScript.RegExp")
objRegExp.Pattern = "([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)"
objRegExp.IgnoreCase = True

'Find the next available row
NextRow = Cells(Rows.Count, "A").End(xlUp).Row + 1

'Loop through each file in the folder
For Each objFile In objFolder.Files
If objRegExp.test(objFile) Then
Cells(NextRow, "A").Value = objFile.Name
Cells(NextRow, "E").Value = objFile.Size
Cells(NextRow, "F").Value = objFile.Type
Cells(NextRow, "G").Value = objFile.DateCreated
Cells(NextRow, "H").Value = objFile.DateLastAccessed
Cells(NextRow, "I").Value = objFile.DateLastModified
Cells(NextRow, "J").Value = objFile.Path
NextRow = NextRow + 1
End If
Next objFile

'Loop through files in the subfolders
If IncludeSubfolders Then
For Each objSubFolder In objFolder.Subfolders
    Call RecursiveFolderPATTERN(objSubFolder, True)
Next objSubFolder
End If

End Sub

答案修改:将行If objRegExp.test(objFile) Then更改为If objRegExp.test(objFile.Name) Then解决了问题。

备用答案编辑:将模式从"([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)"更改为"(^(?!.*specialist).*list.*$)"也很有效。这两种方法都有其优点,所以我打算同时使用它们。

2 个答案:

答案 0 :(得分:0)

这样的事情对你有用吗?

([^A-Za-z]|^)list([^A-Za-z]|$)

它匹配单词&#34; list&#34;没有被其他字母包围。

或者某些单词包含&#34; list&#34;可以接受吗?

Try it out

编辑:允许匹配单词&#34; list&#34;它可以改为:

([^A-Za-z]|^)lists?([^A-Za-z]|$)

编辑2:要将某些前缀列入白名单,您可以将其更改为此(白名单&#34;地址&#34;,&#34;信息&#34;&#34;数据&#34;作为前缀用于示例目的):

([^A-Za-z]|^)(address|info|data)?lists?([^A-Za-z]|$)

答案 1 :(得分:0)

如果您的目标是找到与“列表”匹配但与“专家”不匹配的文件名,请尝试the following regex

(?i)^(?!.*specialist).*list.*$

修改

从模式中删除(?i)并使用以下代码段对其进行测试:

Sub RecursiveFolderPATTERN()
  Dim objRegExp As Object, arrStrings() As String, _
      i As Long, objMatch As Object
  Set objRegExp = CreateObject("VBScript.RegExp")
  With objRegExp
    .Global = True
    .IgnoreCase = True
    .MultiLine = False
    .Pattern = "^(?!.*specialist).*list.*$"
  End With
  Dim TestString As String
  TestString = "3 Specialist" & vbNewLine & _
               "6-specialist" & vbNewLine & _
               "Specialists" & vbNewLine & _
               "true SpeciaList" & vbNewLine & _
               "1 Specialist" & vbNewLine & _
               "2_specialist" & vbNewLine & _
               "Xspecialists" & vbNewLine & _
               "TheListOfSpecialists.xlsx" & vbNewLine & _
               "List" & vbNewLine & _
               "lISTs" & vbNewLine & _
               "Globalistics" & vbNewLine & _
               "GlobalList.doc" & vbNewLine & _
               "fatalistic" & vbNewLine & _
               "The big list of PII.csv" & vbNewLine & _
               "A few lISTs with something.xls"
  arrStrings = Split(TestString, vbNewLine)
  For i = LBound(arrStrings) To UBound(arrStrings)
    If objRegExp.Test(arrStrings(i)) Then
      Debug.Print arrStrings(i)
    End If
  Next
End Sub