使用lookahead的VBA正则表达式

时间:2016-01-26 22:00:03

标签: regex vba

给出如下字符串:

  

"首先是一两句话,然后是我想要的引文   提取。 Bookwriter,Johnny J.,书名,第50版,出版   公司,美国,2016年,p。 18"

使用此正则表达式:"\b[^\.\;]+(,\s+p+\.\s+(\d+\-\d+|\d+))"

我能够匹配字符串的这一部分:

  

" 书名,第50版,出版公司,美国,2016,   页。 18"

我想要的比赛是:

  

" Bookwriter,Johnny J.,书名,第50版,出版   公司,美国,2016年,p。 18"

为了过度简化它,当前正则表达式在句点和页面引用之间找到字符串,如",p。 18"其中没有分号或句号。

我想对此进行调整,以便正则表达式允许在其前面有空格和大写字母的情况下发生一段时间。我知道vba doesn't have lookbehind functionality

运行我给出的示例的VBA代码如下:

Dim exampleString As String
exampleString = "First there is a sentence or two, then a citation which I'd like to extract. Bookwriter, Johnny J., Book Title, 50th Edition, Publishing Company, United States, 2016, p. 18."
Set re = CreateObject("vbscript.regexp")
With re
    .Global = True
    .pattern = "\b[^\.\;]+(,\s+p\.\s(\d+\-\d+|\d+))"
    Set matches = .Execute(exampleString)
End With

1 个答案:

答案 0 :(得分:1)

以下是一个可以获得所需内容的VBA子示例:

Sub Test1()
Dim str As String
Dim objMatches As Object
str = "First there is a sentence or two, then a citation which I'd like to extract. Bookwriter, Johnny J., Book Title, 50th Edition, Publishing Company, United States, 2016, p. 18."
Set objRegExp = CreateObject("VBScript.RegExp") ' Declare the RegExp object
objRegExp.Pattern = "(?:\.\s+(?=[A-Z]))([^;]+(?:,\s+p\.\s+\d+(?:-\d+)?))" ' Set pattern
Set objMatches = objRegExp.Execute(str)  ' Execute the regex match
If objMatches.Count <> 0 Then            ' Check if there are any items in the result
  Debug.Print objMatches.Item(0).SubMatches.Item(0) ' Print Match 1, Submatch 1
  ' > Bookwriter, Johnny J., Book Title, 50th Edition, Publishing Company, United States, 2016, p. 18
End If
End Sub

模式是

(?:\.\s+(?=[A-Z]))([^;]+(?:,\s+p\.\s+\d+(?:-\d+)?))

请参阅demo

您的主要补充是领先的(?:\.\s+(?=[A-Z]))子模式。它匹配.后跟一个或多个空格(\s+),后面跟一个大写字母(不是消费,只是已检查在一个积极的向前看(?=[A-Z]))。我还将(\d+\-\d+|\d+)合并到\d+(?:-\d+)?