可以单独提取每个组的出现,但不能提取为重复组

时间:2019-01-14 12:00:42

标签: regex excel vba

我有许多文件的名称末尾都带有版本号。例如:

Xxxxx V2.txt
Xxxxx V2.3.txt
Xxxxx V2.10.txt
Xxxxx V2.10.3.txt

我使用Regex提取版本号的各个部分,以便可以正确地对文件†进行排序,从而可以计算下一个版本号‡。

†例如:V2.2在V2.10之前,而V2.2在V2.2.3之前。

‡例如:V2.9之后的下一个版本是V2.10。

我可以分别处理每种样式的版本号,但不能一概而论地为所有样式创建一个Regex模式。

Text               Pattern                          Value(s) extracted
Xxxxx V2.txt       Xxxxx V(\d+)\.txt                2
Xxxxx V2.3.txt     Xxxxx V(\d+)\.(\d+)\.txt         2  3
Xxxxx V2.10.3.txt  Xxxxx V(\d+)\.(\d+)\.(\d+)\.txt  2  10  3
Xxxxx V2.10.3.txt  Xxxxx V(\d+){\.(\d+)}*\.txt      No match

我不明白为什么最后一个模式对每种样式的版本号都不起作用。任何指导表示赞赏。

新部分以回应评论

我希望Regex模式中有一个简单的错误,并且我的代码无关紧要。我整理了测试代码以创建:

Sub CtrlTestCapture()

  Dim Patterns As Variant
  Dim Texts As Variant

  Texts = Array("Xxxxx V12.txt", _
                "Xxxxx V12.3.txt", _
                "Xxxxx V12.4.5.txt", _
                "Xxxxx V12.4.5.3.txt")

  Patterns = Array("Xxxxx V(\d+)\.txt", _
                   "Xxxxx V(\d+)\.(\d+)\.txt", _
                   "Xxxxx V(\d+)\.(\d+)\.(\d+)\.txt", _
                   "Xxxxx V(\d+){\.(\d+)}+\.txt", _
                   "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt" , _
                   "Xxxxx V(\d+)(\.(\d+))*\.txt")

  Call TestCapture(Patterns, Texts)

End Sub
Sub TestCapture(ByRef Patterns As Variant, ByRef Texts As Variant)

  Dim InxM As Long
  Dim InxS As Long
  Dim Matches As MatchCollection
  Dim PatternCrnt As Variant
  Dim RegEx As New RegExp
  Dim SubMatchCrnt As Variant
  Dim TextCrnt As Variant

  With RegEx
    .Global = True         ' Find all matches
    .MultiLine = False     ' Match cannot extend across linebreak
    .IgnoreCase = True

    For Each PatternCrnt In Patterns
     .Pattern = PatternCrnt

      For Each TextCrnt In Texts
        Debug.Print "==========================================="
        Debug.Print "   Pattern: """ & PatternCrnt & """"
        Debug.Print "      Text: """ & TextCrnt & """"
        If Not .test(TextCrnt) Then
          Debug.Print Space(12) & "Text does not match pattern"
        Else
          Set Matches = .Execute(TextCrnt)
          If Matches.Count = 0 Then
            Debug.Print Space(12) & "Match but no captures"
          Else
            For InxM = 0 To Matches.Count - 1
              Debug.Print "-------------------------------------------"
              With Matches(InxM)
                Debug.Print "     Match: " & InxM + 1
                Debug.Print "     Value: """ & .Value & """"
                Debug.Print "    Length: " & .Length
                Debug.Print "FirstIndex: " & .FirstIndex
                For InxS = 0 To .SubMatches.Count - 1
                  Debug.Print "  SubMatch: " & InxS + 1 & " """ & .SubMatches(InxS) & """"
                Next
              End With
            Next
          End If
        End If
      Next
    Next
    Debug.Print "==========================================="

  End With

End Sub

使用此代码,WiktorStribiżewregex模式所产生的结果要比我不整洁的代码更好。我将必须查看原始代码以查找错误。使用此代码,WiktorStribiżewregex模式的输出为:

===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.txt"
    Length: 13
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ""
  SubMatch: 3 ""
===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.3.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.3.txt"
    Length: 15
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 "3"
  SubMatch: 3 ""
===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.4.5.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.4.5.txt"
    Length: 17
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 "4"
  SubMatch: 3 "5"
===========================================
   Pattern: "Xxxxx V(\d+)(?:\.(\d+))?(?:\.(\d+))?\.txt"
      Text: "Xxxxx V12.4.5.3.txt"
            Text does not match pattern
===========================================

这具有固定数量的捕获,而不是我尝试的可变数量。我还必须弄清楚如何将其扩展到处理“ 12.4.5.3”,这是我见过的最复杂的版本号样式。这不是完美的方法,但绝对是我当前解决方法的改进。您正在使用我不认识的正则表达式字符,因此需要仔细研究。

使用上面的代码,Tiw regex模式产生了以下输出:

===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.txt"
    Length: 13
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ""
  SubMatch: 3 ""
===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.3.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.3.txt"
    Length: 15
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ".3"
  SubMatch: 3 "3"
===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.4.5.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.4.5.txt"
    Length: 17
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ".5"
  SubMatch: 3 "5"
===========================================
   Pattern: "Xxxxx V(\d+)(\.(\d+))*\.txt"
      Text: "Xxxxx V12.4.5.3.txt"
-------------------------------------------
     Match: 1
     Value: "Xxxxx V12.4.5.3.txt"
    Length: 19
FirstIndex: 0
  SubMatch: 1 "12"
  SubMatch: 2 ".3"
  SubMatch: 3 "3"
===========================================

也就是说,它似乎总是可以捕获:第一部分,包括点的最后部分,以及不带点的最后部分。很有希望,但还不够。

第3部分

我忽略了要求明确说明我寻求的结果的请求。

我在所有重要文件上使用版本号。我从其他人那里收到文件,其中包含版本号,其中一些比我的复杂得多。我始终将版本号作为文件名的最后一部分,并且在版本号之前始终带有“ V”。如果我收到的文件不符合我的格式,则我将它们重命名,以便也可以。所以我有一些文件,例如:

  • Xxxxx VN.xxx
  • Xxxxx VN.N.xxx
  • Xxxxx VN.N.N.xxx
  • Xxxxx VN.N.N.xxx。

我希望将Ns提取到可变长度数组或集合中,以便可以使用通用例程来处理它们。实际上,我已经有了那些通用例程。这些例程依赖于提取Ns的一些凌乱的VBA代码。我以为使用Regex可以整理代码。

2 个答案:

答案 0 :(得分:3)

尝试此正则表达式:

V(\d+(?:\.\d+)*)\.txt$

所需的版本已在组1中捕获。您可以使用.

进一步拆分组1的内容。

Click for Demo

代码:

Dim objReg, strFile, objMatches, strVersion, arrVersion
strFile = "Xxxxx V2.3.txt"
Set objReg = New RegExp
objReg.Global = True
objReg.Multiline = True
objReg.Pattern = "V(\d+(?:\.\d+)*)\.txt$"

If objReg.Test(strFile) Then
    Set objMatches = objReg.Execute(strFile)
    strVersion =  objMatches.item(0).submatches.item(0)   'To get the full version number
    arrVersion = Split(strVersion,".")                    'To get each number in the version(stored in array)
End If

正则表达式说明

  • V(\d+(?:\.\d+)*)\.txt$
  • V-匹配V
  • (\d+(?:\.\d+)*)-匹配1个以上的数字。匹配了尽可能多的数字后,请匹配0个或多个出现的点.,再加上1个以上的数字。整个匹配项在第1组中捕获,是您所需的版本号
  • \.txt-匹配.txt
  • $-声明该行的结尾。

答案 1 :(得分:1)

如果您愿意,这里是一个非正则表达式的解决方案。您可以将版本号转换为数字,然后对其进行排序。

Sub GetOrderedList()
    Dim Texts               As Variant
    Dim FileName            As String
    Dim FileArrayList       As Object
    Dim Item                As Variant

    Set FileArrayList = CreateObject("System.Collections.ArrayList")

    Texts = Array("Xxxxx V12.txt", _
                  "Xxxxx V12.3.txt", _
                  "Xxxxx V12.4.5.txt", _
                  "Xxxxx V12.4.5.3.txt")


    For i = LBound(Texts) To UBound(Texts)
        'You get use the FileSystemObject to make this a bit easier
        FileName = Replace(Replace(Split(Texts(i), " ")(UBound(Split(Texts(i), " "))), "V", ""), ".txt", "")
        PeriodPosition = InStr(1, FileName, ".")

        'Convert to a number, then sort
        If PeriodPosition > 0 Then FileName = Left$(FileName, PeriodPosition) & Replace(FileName, ".", "0", PeriodPosition + 1)
        FileArrayList.Add FileName
    Next

    'Sort
    FileArrayList.Sort

    'Print out, ascending order
    For Each Item In FileArrayList
        Debug.Print Item
    Next

End Sub