Regex Excel Mid Issue

时间:2013-06-03 16:56:03

标签: regex excel

我本质上是试图使用正则表达式提取美元金额,但无法弄清楚如何提取美元金额,这可能会有不同的数字。下面是我想要提取的金额字段的示例,该字段始终位于字段的中间:

<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3>

我现在拥有的东西:<amount>.*</amount>(这个结果不能得到我想要的东西)

对于这个领域,我想提取310万的数字。围绕美元数字的结构(类似于html)将始终相同。任何帮助表示赞赏。

2 个答案:

答案 0 :(得分:-1)

的Excel

由于您在excel中执行此操作,因此您可能需要考虑使用此公式

=MID(B1,SEARCH("<amount>",B1)+8,SEARCH("</amount>",B1)-(SEARCH("<amount>",B1) + 8))

  • B1 =输入字符串
  • +8会补偿字符串<amount>
  • 的宽度
  • C栏显示使用的公式

enter image description here

的正则表达式

如果您使用VBA和正则表达式执行此操作,则可以使用正则表达式:<(amount)\b[^>]*>([^<]*)<\/\1>

enter image description here

此VB.net示例仅用于显示正则表达式如何使用金额标记中的每个美元值填充第3组。

Imports System.Text.RegularExpressions
Module Module1
  Sub Main()
    Dim sourcestring as String = "<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3>
    <field1>05/14/2013</field1><amount>4,444,444.00</amount><field3>026002561</field3>"
    Dim re As Regex = New Regex("<(amount)\b[^>]*>([^<]*)<\/\1>",RegexOptions.IgnoreCase OR RegexOptions.Multiline OR RegexOptions.Singleline)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
  End Sub
End Module

$matches Array:
(
    [0] => Array
        (
            [0] => <amount>3,100,000.00</amount>
            [1] => <amount>4,444,444.00</amount>
        )

    [1] => Array
        (
            [0] => amount
            [1] => amount
        )

    [2] => Array
        (
            [0] => 3,100,000.00
            [1] => 4,444,444.00
        )

)

答案 1 :(得分:-1)

使用Excel VBA提取捕获组。

VBA代码

Function TestRegExp(ByVal myString As String, _
                      ByVal myPattern As String, _
                      Optional seperator As String = "") As String
   Dim objRegExp As RegExp
   Dim colMatches As MatchCollection
   Dim RetStr As String

   Set objRegExp = New RegExp
   objRegExp.Pattern = myPattern
   objRegExp.IgnoreCase = True
   objRegExp.Global = True
   seperator = "|"

   If (objRegExp.Test(myString) = True) Then
    Set colMatches = objRegExp.Execute(myString)
    For i = 0 To colMatches.Count - 1
        For j = 0 To colMatches.Item(i).SubMatches.Count - 1
            If (RetStr <> "") Then
                RetStr = RetStr & seperator & colMatches.Item(i).SubMatches.Item(j)
            Else
                RetStr = colMatches.Item(i).SubMatches.Item(j)
            End If
        Next
    Next
   Else
    RetStr = "No Match"
   End If
   TestRegExp = RetStr
End Function

<强> Excel中
Excel中用于测试的功能将是:

=TestRegExp(B2,"<amount>([^<]*)<\/amount>")

其中,单元格B2包含您的文字:

<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3>
Output: 3,100,000

OR

<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3><amount>999</amount>  
Output: 3,100,000|999

请注意:

  1. 使用.*?代替.*。这有助于解决多个数量标签问题,因为它会因问号而懒惰地解析。您可以在代码中选择分隔符。
  2. 诀窍是使用子匹配来获取捕获组