使用RegEx选择范围

时间:2013-03-17 01:01:03

标签: regex syntax vbscript

我将用示例代码解释我所追求的内容。我的函数GetDox看起来很接近,但仍然不完整。这是一个测试代码。

'test begin...
'<dox>
'  <member type="Public Sub" name="Increment" return="void">
'    <param type="Integer" name="nBase" out="true" />
'    <param type="Integer" name="nStep" out="false" />
'    <purpose>
'      purpose here...
'    </purpose>
'  </member>
'  <member ... />
'</dox>
'other comments here...
Public Sub Increment(nBase, nStep) 'some example content
    nBase = nBase + nStep
End Sub
'<Unwonted_Item />

Dim source  'reading the same file just for simplification
With CreateObject("Scripting.FileSystemObject")
    With .OpenTextFile(WScript.ScriptFullName, 1, False)
        source = .ReadAll
    End With
End With
result = GetDox(source)
WScript.Echo result  'display our result

Function GetDox(sCode)  'unfinished function
    Dim regEx, Match, Matches, mVal, sEnd
    sEnd = "</dox>" & vbNewLine
    Set regEx = New RegExp
    regEx.Pattern = "('<dox>\n|'\s*<.*)" 'my ugly pattern
    regEx.IgnoreCase = True
    regEx.Global = True
    Set Matches = regEx.Execute(sCode)
    For Each Match In Matches
        mVal = Match.Value
        mVal = Replace(mVal, vbCr, vbNewLine)
        mVal = Right(mVal, Len(mVal) - 1)
        GetDox = GetDox & mVal
        If mVal = sEnd Then Exit For
    Next
End Function

这就是我得到的:

<dox>
  <member type="Public Sub" name="Increment" return="void">
    <param type="Integer" name="nBase" out="true" />
    <param type="Integer" name="nStep" out="false" />
    <purpose>
    </purpose>
  </member>
  <member ... />
</dox>

这就是我需要的:

<dox>
  <member type="Public Sub" name="Increment" return="void">
    <param type="Integer" name="nBase" out="true" />
    <param type="Integer" name="nStep" out="false" />
    <purpose>
      purpose here...
    </purpose>
  </member>
  <member ... />
</dox>

缺少“目的......”这一行,我知道整个RegExp.Pattern语法很弱。我只想选择以<dox>开头并以</dox>结尾的整个内容,包括其中的所有内容,但我仍然坚持模式语法。

P.S。有了这么好的帮助(感谢所有人),这是我现在的工作职能:

Function GetDox(sCode)
    GetDox = vbNullString
    With New RegExp
        .Pattern    = "<dox>[\s\S]*?</dox>"
        .IgnoreCase = True
        .Global     = False
        With .Execute(sCode)
            If .Count = 0 Then Exit Function
            GetDox  = .Item(0).Value
        End With
        .Pattern    = "^'"
        .Global     = True
        .Multiline  = True
        GetDox = .Replace(GetDox, "")
    End With
End Function

2 个答案:

答案 0 :(得分:2)

我首先删除主要的单引号:

regEx.Pattern = "^'"
regEx.Global  = True
sCode = regEx.Replace(sCode, "")

然后提取XML文本:

regEx.Pattern = "<dox>[\s\S]*?</dox>"
regEx.Global  = False
regEx.IgnoreCase = True
Set m = regEx.Execute(sCode)
If m.Count > 0 Then GetDox = m(0).Value

之后,您应该将XML读入DOM tree以进行进一步处理:

Set xml = CreateObject("Msxml2.DOMDocument.6.0")
xml.async = False
xml.loadXML result

如果您的XML位于单独的文件中,则应直接从文件加载XML并使用XPath表达式提取节点,如@FrankSchmitt在其评论中所建议的那样。

Set xml = CreateObject("Msxml2.DOMDocument.6.0")
xml.async = False
xml.load "C:\path\to\your.xml"

Set nodes = xml.selectNodes("//dox")

XML不是面向行的,不应该像它一样进行解析。如果你没有正确处理它,事情可能会以有趣的方式破裂。

答案 1 :(得分:1)

要修复您的代码,您可以使用此正则表达式:('<dox>\n|'\s*[\S \t]*) demo

另一种方法是首先使用<dox>[\s\S]+?<\/dox>获取您需要的所有内容,然后对其进行替换:
搜索:^'并替换为

或者,清除前导空格:
搜索:^'\s*并替换为demo