Question

我正在尝试使用Excel 2000/2003将mmCIF蛋白文件中的一行解析为单独的标记。最坏的情况可能看起来像这样：

token1 token2 "token's 1a',1b'" 'token4"5"' 12 23.2 ? . 'token' tok'en to"ken

哪个应成为以下令牌：

token1  
token2  
token's 1a',1b' (note: the double quotes have disappeared)  
token4"5" (note: the single quotes have disappeared)  
12  
23.2  
?  
.  
token (note: the single quotes have disappeared)  
to'ken  
to"ken

我希望看看RegEx是否有可能将这种线分成令牌？

Answer 1

可以这样做：

您需要在VBA项目中引用“Microsoft VBScript Regular Expressions 5.5”，然后......

Private Sub REFinder(PatternString As String, StringToTest As String)
    Set RE = New RegExp

    With RE
        .Global = True
        .MultiLine = False
        .IgnoreCase = False
        .Pattern = PatternString
    End With

    Set Matches = RE.Execute(StringToTest)

    For Each Match In Matches
        Debug.Print Match.Value & " ~~~ " & Match.FirstIndex & " - " & Match.Length & " = " & Mid(StringToTest, Match.FirstIndex + 1, Match.Length)

        ''#You get a submatch for each of the other possible conditions (if using ORs)
        For Each Item In Match.SubMatches
            Debug.Print "Submatch:" & Item
        Next Item
        Debug.Print
    Next Match

    Set RE = Nothing
    Set Matches = Nothing
    Set Match = Nothing
    Set SubMatch = Nothing
End Sub

Sub DoIt()
    ''#This simply splits by space...
    REFinder "([.^\w]+\s)|(.+$)", "Token1 Token2 65.56"
End Sub

这显然只是一个非常简单的例子，因为我对RegExp不是很了解，它更像是向你展示它如何在VBA中完成（你可能也想做一些比Debug.Print更有用的事情）由此产生的代币！）。我不得不把RegExp表达式写给别人我害怕！

西蒙

Answer 2

很好的谜题。感谢。

这种模式（下面的aPatt）将令牌分开，但我无法弄清楚如何删除外部引号。

tallpaul（）产生：

 token1
 token2
 "token's 1a',1b'"
 'token4"5"'
 12
 23.2
 ?
 .
 'token'
 tok'en
 to"ken

如果您能弄清楚如何丢失外部报价，请告诉我们。这需要引用“Microsoft VBScript正则表达式”才能工作。

Option Explicit
''returns a list of matches
Function RegExpTest(patrn, strng)
   Dim regEx   ' Create variable.
   Set regEx = New RegExp   ' Create a regular expression.
   regEx.Pattern = patrn   ' Set pattern.
   regEx.IgnoreCase = True   ' Set case insensitivity.
   regEx.Global = True   ' Set global applicability.
   Set RegExpTest = regEx.Execute(strng)   ' Execute search.
End Function

Function tallpaul() As Boolean
    Dim aString As String
    Dim aPatt As String
    Dim aMatch, aMatches

    '' need to pad the string with leading and trailing spaces.
    aString = " token1 token2 ""token's 1a',1b'"" 'token4""5""' 12 23.2 ? . 'token' tok'en to""ken "
    aPatt = "(\s'[^']+'(?=\s))|(\s""[^""]+""(?=\s))|(\s[\w\?\.]+(?=\s))|(\s\S+(?=\s))"
    Set aMatches = RegExpTest(aPatt, aString)

    For Each aMatch In aMatches
          Debug.Print aMatch.Value
    Next
    tallpaul = True
End Function

VBA中的RegEx：将复杂的字符串分解为多个标记？

2 个答案: