如何跳过" ="然后捕获所有以逗号分隔的单词

时间:2013-05-17 23:42:21

标签: regex vbscript

我现在正在使用Instr / Split进行此操作,但是通常发现正则表达式要快得多(这是一个内部循环,每次运行100K +测试)。

一般形式是:

word0 = word1, word2, word3...  

=右侧有一个或多个字词。一个词被定义为[\w.-]+。我也需要在任何一点允许空格。 =是必需的。

我想在Matches集合中返回 word1 word2 word3

=让我感到难过。我要么得到一场比赛,要么取决于比赛模式。

这是一些测试代码。更改第17行的RE.Pattern以进行测试。

Option Explicit

Test1 "word1, word2",""
Test1 " word0 = word1, word.2  , word3.qrs_t-1", "word1 word.2 word3.qrs_t-1"
Test1 "word0=word1", "word1"

WScript.Quit

Sub Test1(TestString, CorrectOutput)

    Dim RE, Matches, Answer
    Dim i, j

    Set RE     = New RegExp
    RE.Global  = True

    RE.Pattern = "=([\w.-]+)"

    Set Matches = RE.Execute(TestString)

    Answer =  "Input:  " & vbTab & TestString & vbLf
    Answer = Answer & "Correct:" & vbTab & CorrectOutput & vbLf

    Answer = Answer &  "Actual: " & vbTab

    For i = 0 To Matches.Count -1
        If i > 0 Then
            Answer = Answer & " "
        End If
        Answer = Answer & Matches(i).value
    Next

    MsgBox Answer

End Sub

2 个答案:

答案 0 :(得分:0)

描述

尝试一下,它会:

  • 在等号前需要一个值
  • 需要等号
  • 在等号
  • 后至少需要1个值
  • 返回1到3个以逗号分隔的文本块中的每一个
  • 修剪所有返回值的空格

(?:^\s*?(\b[^=]*?\b)(?:\s{0,}[=]\s{0,}))(?:(['"]?)(\b[^,]*\b)\2\s*?)(?:$|(?:[,]\s*?(['"]?)(\b[^,]*\b)\4\s*?)(?:$|[,]\s*?(['"]?)(\b[^,]*\b)\6\s*?$))

enter image description here (右键单击图像,然后在新选项卡中选择视图,或选择全尺寸的新窗口)

  • 组0匹配完整字符串(如果有效)
  • 组1-7
    1. 等号前的值
    2. 引用分隔符,如果有一个值为1
    3. 值列表中的第一个值
    4. 引用分隔符,如果有值2
    5. 值列表中的第二个值
    6. 引用分隔符,如果有值为3
    7. 值列表中的第三个值

VB.NET代码示例以演示正则表达式的工作原理

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "replace with your source string"
    Dim re As Regex = New Regex("(?:^\s*?(\b[^=]*?\b)(?:\s{0,}[=]\s{0,}))(?:(['"]?)(\b[^,]*\b)\2\s*?)(?:$|(?:[,]\s*?(['"]?)(\b[^,]*\b)\4\s*?)(?:$|[,]\s*?(['"]?)(\b[^,]*\b)\6\s*?$))",RegexOptions.IgnoreCase OR RegexOptions.Multiline OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

$matches Array:
(
    [0] => Array
        (
            [0] =>  word0 = word1, word.2  , word3.qrs_t-1
        )

    [1] => Array
        (
            [0] => word0
        )

    [2] => Array
        (
            [0] => 
        )

    [3] => Array
        (
            [0] => word1
        )

    [4] => Array
        (
            [0] => 
        )

    [5] => Array
        (
            [0] => word.2
        )

    [6] => Array
        (
            [0] => 
        )

    [7] => Array
        (
            [0] => word3.qrs_t-1
        )

)

答案 1 :(得分:0)

使用以下正则表达式从输入字符串中提取带有wordlist的子字符串:

str = "..."

Set re = New RegExp
re.Pattern = "^.*?=((?:[^,]+)(?:,[^,]+)*)"
re.Global  = True

Set m = re.Execute(str)

然后使用第二个表达式删除分隔逗号并破坏空格:

Set re2 = New RegExp
re2.Pattern = "\s*,\s*"
re2.Global  = True

wordlist = ""
If m.Count > 0 Then
  wordlist = Trim(re2.Replace(m(0).SubMatches(0), " "))
End If

WScript.Echo wordlist