我本质上是试图使用正则表达式提取美元金额,但无法弄清楚如何提取美元金额,这可能会有不同的数字。下面是我想要提取的金额字段的示例,该字段始终位于字段的中间:
<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3>
我现在拥有的东西:<amount>.*</amount>
(这个结果不能得到我想要的东西)
对于这个领域,我想提取310万的数字。围绕美元数字的结构(类似于html)将始终相同。任何帮助表示赞赏。
答案 0 :(得分:-1)
由于您在excel中执行此操作,因此您可能需要考虑使用此公式
=MID(B1,SEARCH("<amount>",B1)+8,SEARCH("</amount>",B1)-(SEARCH("<amount>",B1) + 8))
B1
=输入字符串+8
会补偿字符串<amount>
如果您使用VBA和正则表达式执行此操作,则可以使用正则表达式:<(amount)\b[^>]*>([^<]*)<\/\1>
此VB.net示例仅用于显示正则表达式如何使用金额标记中的每个美元值填充第3组。
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3>
<field1>05/14/2013</field1><amount>4,444,444.00</amount><field3>026002561</field3>"
Dim re As Regex = New Regex("<(amount)\b[^>]*>([^<]*)<\/\1>",RegexOptions.IgnoreCase OR RegexOptions.Multiline OR RegexOptions.Singleline)
Dim mc as MatchCollection = re.Matches(sourcestring)
Dim mIdx as Integer = 0
For each m as Match in mc
For groupIdx As Integer = 0 To m.Groups.Count - 1
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
Next
mIdx=mIdx+1
Next
End Sub
End Module
$matches Array:
(
[0] => Array
(
[0] => <amount>3,100,000.00</amount>
[1] => <amount>4,444,444.00</amount>
)
[1] => Array
(
[0] => amount
[1] => amount
)
[2] => Array
(
[0] => 3,100,000.00
[1] => 4,444,444.00
)
)
答案 1 :(得分:-1)
使用Excel VBA提取捕获组。
VBA代码
Function TestRegExp(ByVal myString As String, _
ByVal myPattern As String, _
Optional seperator As String = "") As String
Dim objRegExp As RegExp
Dim colMatches As MatchCollection
Dim RetStr As String
Set objRegExp = New RegExp
objRegExp.Pattern = myPattern
objRegExp.IgnoreCase = True
objRegExp.Global = True
seperator = "|"
If (objRegExp.Test(myString) = True) Then
Set colMatches = objRegExp.Execute(myString)
For i = 0 To colMatches.Count - 1
For j = 0 To colMatches.Item(i).SubMatches.Count - 1
If (RetStr <> "") Then
RetStr = RetStr & seperator & colMatches.Item(i).SubMatches.Item(j)
Else
RetStr = colMatches.Item(i).SubMatches.Item(j)
End If
Next
Next
Else
RetStr = "No Match"
End If
TestRegExp = RetStr
End Function
<强> Excel中强>
Excel中用于测试的功能将是:
=TestRegExp(B2,"<amount>([^<]*)<\/amount>")
其中,单元格B2包含您的文字:
<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3>
Output: 3,100,000
OR
<field1>05/14/2013</field1><amount>3,100,000.00</amount><field3>026002561</field3><amount>999</amount>
Output: 3,100,000|999
请注意:
.*?
代替.*
。这有助于解决多个数量标签问题,因为它会因问号而懒惰地解析。您可以在代码中选择分隔符。