因此,我每天都会收到包含相关信息的电子邮件。不幸的是,由于某种原因,数据是在电子邮件正文中发送的,而不是作为附件发送的。那好吧。我使用Excel来使用VBA来抓取Outlook。
Sub mytry()
Dim olapp As Object
Dim olmapi As Object
Dim olmail As Object
Dim olitem As Object
Dim lrow As Integer
Dim olattach As Object
Dim str As String
Dim TextWeNeedToParse as String
Const num As Integer = 6
Const path As String = "C:\HP\"
Const emailpath As String = "C:\Dell\"
Const olFolderInbox As Integer = 6
Set olp = CreateObject("outlook.application")
Set olmapi = olp.getnamespace("MAPI")
Set olmail = olmapi.getdefaultfolder(num)
If olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""").Count = 0 Then
Else
For Each olitem In olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""")
TextWeNeedToParse = olitem.body
'Recursive text parsing here
Next olitem
End If
好的,所以这段代码片段应该将文本的整个主体变成一个字符串。现在我们可以传递字符串并操纵它。
我正在处理的文字样本:
WAL +300bp QTY
(M) FCTR SECURITY CPN ASK 1mPSA TYPE
0.77 1.15 458 0.04 GNR 2012-61 CA 2.00 99-16 217 SEQ
1.39 2.26 120 0.76 GNR 2005-13 AE 5.00 102-24 223 SUP
1.40 18.16 45 0.65 GNR 2015-157 NH 2.50 95-16 215 EXCH,+
1.50 21.56 25 0.94 GNR 2017-103 HD 3.00 98-08 375 PAC-2
因此我可以通过几种不同的方式来解决这个问题,但我并不完全了解所有这些内容。
1)我可以尝试计算存在多少回车,并进行循环。然后"计数"确定一切都在哪里的空间。不太清楚它的效果如何。
2)我可以在中间重新编写唯一的ID,如果我能弄清楚如何正则表达第n个实例(我被卡住的一个主要点),我也可以用它来正则表达式数字 - 例如,第一行将是由空格包围的直数/小数的1-5实例,以及number-number-dash-number-number的第一个实例。
我正在通过它的示例正则表达式代码:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
3)我可以尝试上面的一些方法,但使用递归。我的递归相当薄弱。
因此,一旦我提取了文本字符串,我想我需要像:
Sub QuickExample(Dim Cusip as String, Dim PriceStr as variant, Dim SpreadStr as variant)
Dim ws as WorkSheet
Set ws = thisworkbook.sheets("Results")
LastRow = ws.Cells(sht.Rows.Count, "A").End(xlUp).Row
ws.cells(Lastrow,1).value2 = Cusip
ws.cells(Lastrow,2).value2 = PriceStr
ws.cells(Lastrow,3).value2 = SpreadStr
End Sub
最后:
Sub ParsingDate(EmailText as String)
Dim CarriageReturns As Long
CarriageReturns = Len(EmailText) - Len(Replace(EmailText, Chr(10), ""))
For i = 1 to CarriageReturns
'Parse out the data for the ith row, return it to the function above
Next i
End Sub
我正在努力解析的实际解析行为 - 我如何正确得到第n个结果,而且只得到第n个结果?即使添加了一些额外的空格或线条,我如何确保它仍能继续工作?有没有办法只使用正则表达式,"看"在给定表达式的第n个发现周围?没有很多递归就可以做到这一点吗?
谢谢
答案 0 :(得分:1)
WAL +300bp QTY
(M) FCTR SECURITY CPN ASK 1mPSA TYPE
0.77 1.15 458 0.04 GNR 2012-61 CA 2.00 99-16 217 SEQ
1.39 2.26 120 0.76 GNR 2005-13 AE 5.00 102-24 223 SUP
1.40 18.16 45 0.65 GNR 2015-157 NH 2.50 95-16 215 EXCH,+
1.50 21.56 25 0.94 GNR 2017-103 HD 3.00 98-08 375 PAC-2
This seems like a pretty well formatted table. Perhaps pop each line into an array using Split()
and then each field into an array, again using split()
:
Sub dump()
arrLine = Split(TextWeNeedToParse, Chr(10))
For Each Line In arrLine
For Each field In Split(Line, " ")
Debug.Print field
Next
Next
End Sub
That's super short and runs quick. You are just an if statement and counter (or regex test) away from getting the exact items you want.
Testing/counting may be easier if you remove multiple spaces so the split()
puts each element in it's proper place. You could employee a loop to remove multiple spaces before running this:
Fully implemented it might be something like:
<your code to get the bod>
'remove multiple spaces from string for parsing
Do While InStr(1, TextWeNeedToParse, " ")
TextWeNeedToParse= Replace(TextWeNeedToParse, " ", " ")
Loop
'Get each line into an array element
arrLine = Split(TextWeNeedToParse, Chr(10))
'Loop through the array
For Each Line In arrLine
'dump fields to an array
arrFields = Split(Line, " ")
'and spit out a particular element (your "unique id" is element 5)
If UBound(arrFields) >= 5 Then Debug.Print "unique id:"; arrFields(5)
Next