我正在尝试使用Excel 2000/2003将mmCIF蛋白文件中的一行解析为单独的标记。最坏的情况可能看起来像这样:
token1 token2 "token's 1a',1b'" 'token4"5"' 12 23.2 ? . 'token' tok'en to"ken
哪个应成为以下令牌:
token1
token2
token's 1a',1b' (note: the double quotes have disappeared)
token4"5" (note: the single quotes have disappeared)
12
23.2
?
.
token (note: the single quotes have disappeared)
to'ken
to"ken
我希望看看RegEx是否有可能将这种线分成令牌?
答案 0 :(得分:1)
可以这样做:
您需要在VBA项目中引用“Microsoft VBScript Regular Expressions 5.5”,然后......
Private Sub REFinder(PatternString As String, StringToTest As String)
Set RE = New RegExp
With RE
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = PatternString
End With
Set Matches = RE.Execute(StringToTest)
For Each Match In Matches
Debug.Print Match.Value & " ~~~ " & Match.FirstIndex & " - " & Match.Length & " = " & Mid(StringToTest, Match.FirstIndex + 1, Match.Length)
''#You get a submatch for each of the other possible conditions (if using ORs)
For Each Item In Match.SubMatches
Debug.Print "Submatch:" & Item
Next Item
Debug.Print
Next Match
Set RE = Nothing
Set Matches = Nothing
Set Match = Nothing
Set SubMatch = Nothing
End Sub
Sub DoIt()
''#This simply splits by space...
REFinder "([.^\w]+\s)|(.+$)", "Token1 Token2 65.56"
End Sub
这显然只是一个非常简单的例子,因为我对RegExp不是很了解,它更像是向你展示它如何在VBA中完成(你可能也想做一些比Debug.Print更有用的事情)由此产生的代币!)。我不得不把RegExp表达式写给别人我害怕!
西蒙
答案 1 :(得分:1)
很好的谜题。感谢。
这种模式(下面的aPatt)将令牌分开,但我无法弄清楚如何删除外部引号。
tallpaul()产生:
token1
token2
"token's 1a',1b'"
'token4"5"'
12
23.2
?
.
'token'
tok'en
to"ken
如果您能弄清楚如何丢失外部报价,请告诉我们。 这需要引用“Microsoft VBScript正则表达式”才能工作。
Option Explicit
''returns a list of matches
Function RegExpTest(patrn, strng)
Dim regEx ' Create variable.
Set regEx = New RegExp ' Create a regular expression.
regEx.Pattern = patrn ' Set pattern.
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True ' Set global applicability.
Set RegExpTest = regEx.Execute(strng) ' Execute search.
End Function
Function tallpaul() As Boolean
Dim aString As String
Dim aPatt As String
Dim aMatch, aMatches
'' need to pad the string with leading and trailing spaces.
aString = " token1 token2 ""token's 1a',1b'"" 'token4""5""' 12 23.2 ? . 'token' tok'en to""ken "
aPatt = "(\s'[^']+'(?=\s))|(\s""[^""]+""(?=\s))|(\s[\w\?\.]+(?=\s))|(\s\S+(?=\s))"
Set aMatches = RegExpTest(aPatt, aString)
For Each aMatch In aMatches
Debug.Print aMatch.Value
Next
tallpaul = True
End Function