通过忽略某些单词找到正则表达式

时间:2015-03-22 12:00:37

标签: regex vb.net

我是正则表达式的新手。我想通过忽略像“in”,“of”,“the”这样的常用词以及逗号,反斜杠等特殊字符来搜索字符串中的多个单词。

我的代码

  Dim StringToSearchFrom As String = "Thus, one shifts one's focus in a variety of directions all at the same time"
  Dim PhraseToSearch As String = "focus variety directions"
  Dim found1 As Match = Regex.Match(StringToSearchFrom, Regex needed)
        If found1.Success Then
            MsgBox(found1.Index)
        Else

第一个正则表达式应该在尝试查找然后返回PhraseToSearch的第一个单词(焦点)的索引时忽略完整的单词“in”,“a”和“of”。感谢

1 个答案:

答案 0 :(得分:1)

您可以使用以下正则表达式,您必须动态构建。这是一个概念验证示例,它将捕获字符串中的“焦点变化”,忽略“a”和“in”:

Public Dim MyRegex As Regex = New Regex( _
      "focus(?:(?:\b(?:in|of|a|the)\b\s*|[\p{P}\p{S}\p{Z}]*)*)variety", _
    RegexOptions.IgnoreCase _
    Or RegexOptions.CultureInvariant _
    Or RegexOptions.Compiled _
    )

<强>解释

要使字符串的一部分可选,我们仍然可以在模式中捕获它。如果您使用(?:(?:\b(?:in|of|a|the)\b\s*|[\p{P}\p{S}\p{Z}]*)*)替换查询字符串中的所有可选子字符串,则可以匹配单词列表(?:in|of|a|the)中的任何字词(使用您的单词列表更新),标点符号\p{P},符号\p{S},空白\p{Z}

  Dim StringToSearchFrom As String = "Thus, one shifts one's focus in a variety of directions all at the same time"
  Dim PhraseToSearch As String = "focus variety directions"
  Dim optional_pattern As String = "(?:(?:\b(?:in|of|a|the)\b\s*|[\p{P}\p{S}\p{Z}]*)*)" 
  Dim rgx_Optional As New Regex(optional_pattern)
  PhraseToSearch = rgx_Optional.Replace(PhraseToSearch, optional_pattern)
  Dim rgx_Search As New Regex(PhraseToSearch)
  ' And then apply our regex
  Dim found1 As Match = rgx_Search.Match(StringToSearchFrom)
    If found1.Success Then
        MsgBox(found1.Index)
    Else