使用复杂条件将文本拆分为数组

时间:2014-05-21 07:22:41

标签: arrays string vbscript split

我有一个字符串,我需要拆分成一个数组。大多数情况下,不同的部分由.(点)分隔,但有时,字符串可能包含一个包含大括号{ }的部分,大括号内的任何点都不应被解释为分割字符。

我已经构建了下面的代码来执行此操作,但是想知道是否有更优雅的解决方案(例如正则表达式)

Pov = UCase(Trim(Pov))

'Loop through the Pov and escape any dots inside curly brackets
Level = 0
Escaped = ""
For Pos = 1 To Len(Pov)
    PosChar = Mid(Pov, Pos, 1)
    If PosChar = "{" Then
        Level = Level + 1
        Escaped = Escaped & PosChar
    ElseIf PosChar = "}" Then
        Level = Level - 1
        Escaped = Escaped & PosChar
    ElseIf PosChar = "." Then
        If Level > 0 Then
            Escaped = Escaped & "^^^ This is a nested dot ^^^"
        Else
            Escaped = Escaped & PosChar
        End If
    Else
        Escaped = Escaped & PosChar
    End If
Next

'Split the Pov and replace any nested dots
PovSplit = Split(Pov, ".")
For Part = LBound(PovSplit) To UBound(PovSplit)
    PovSplit(Part) = Replace(PovSplit(Part), "^^^ This is a nested dot ^^^", ".")
Next

1 个答案:

答案 0 :(得分:1)

不,“直接”无法使用正则表达式。 here你可以阅读原因。

无论如何,对于使用正则表达式的解决方案(很多代码,但根据您的数据长度,它可以更快或更快,您将需要尝试)

Dim dicEncode
    set dicEncode = WScript.CreateObject("Scripting.Dictionary")

Dim encodeRE
    Set encodeRE = New RegExp
    With encodeRE
        .Pattern = "\{[^{}]*\}"
        .Global = True
        .IgnoreCase = True
    End With

Dim decodeRE
    Set decodeRE = New RegExp
    With decodeRE
        .Pattern = "\x00(K[0-9]+)\x00"
        .Global = True
        .IgnoreCase = True
    End With

Function encodeFunction(matchString, position, fullString)
    Dim key
        key = "K" & CStr(dicEncode.Count)
    dicEncode.Add key , matchString
    encodeFunction = Chr(0) & key & Chr(0)
End Function 

Function decodeFunction(matchString, key, position, fullString)
    decodeFunction = dicEncode.Item(key)
End Function


Dim originalString    
    originalString = "{abc.def{gh.ijk}l.m}n.o.p{q.r{s{t{u.v}}}w}.x"

Dim encodedString, workBuffer

    encodedString = originalString
    Do
        workBuffer = encodedString
        encodedString = encodeRE.Replace(encodedString,GetRef("encodeFunction"))
    Loop While encodedString <> workBuffer

    encodedString = Replace(encodedString, ".", Chr(0))

    Do 
        workBuffer = encodedString
        encodedString = decodeRE.Replace(encodedString,GetRef("decodeFunction"))
    Loop While encodedString <> workBuffer

Dim aElements, element
    aElements = Split(encodedString, Chr(0))

    WScript.Echo originalString

    For Each element In aElements
        WScript.Echo element
    Next 

所有这些代码只是使用正则表达式来查找字符串中的配对花括号,用一个存储在字典中的键指示符替换它们及其附带的数据。当从字符串中删除所有“封闭”数据时,剩余的点(您的分割点)将替换为新字符(稍后将用于分割字符串),然后重建字符串。所有“封闭”点都受到保护,可以使用新字符(代码中的Chr(0))在字符串上完成拆分。

它类似于统计压缩器的字典创建,但当然没有任何统计和压缩。

但只对长字符串有用。如果没有,你的原始方法会更好。

已编辑以适应评论

对于性能更好的代码,基于OP原始方法。没有异国情调的正常表达。只是删除了字符串连接和不必要的检查。

Function mySplit(originalString)
Dim changedString, currentPoint, currentChar, stringEnd, level

    changedString = originalString
    stringEnd = Len(originalString)

    level = 0
    For currentPoint = 1 To stringEnd
        currentChar = Mid(originalString, currentPoint, 1)
        If currentChar = "{" Then 
            level = level + 1
        ElseIf currentChar = "}" Then
            If level > 0 Then 
                level = level - 1
            End If
        ElseIf level = 0 Then 
            If currentChar = "." Then 
                changedString = Left(changedString,currentPoint-1) & Chr(0) & Right(changedString,stringEnd-currentPoint)
            End If
        End If
    Next 

    mySplit = split( changedString, Chr(0) )
End Function