我有一个文本文件,我想解析并“清理”。来自文件的示例数据
Trade '4379160'\Acquire Day 2015-05-07 Create acquire_day Trade '4379160'\Fund XXXY Create acquirer_ptynbr Trade '4379160'\Assinf Create assinf Trade '4379160'\Authorizer Create authorizer_usrnbr Trade '4379160'\Base Curr Equivalent 0 Create base_cost_dirty
我想要实现的是在第一个反斜杠之后获得前2个“字段”。例如,Acquire Day 2015-05-07
。请注意,有时第二个字段为空(这是正常的 - 我不需要任何创建字符串)。我所做的是使用RegEx
首先在反斜杠后找到任何内容,然后获得2个必填字段。到目前为止我的测试代码
Private Sub SanitiseTradeAudit(fileInput)
Dim objFSO, objFile, regEx, validTxt, validTxt1, arrValidTxt, i
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(fileInput, 1)
validTxt = objFile.ReadAll
objFile.Close
Set objFile = Nothing
Set regEx = New RegExp
regEx.Pattern = "(.*)\'\\(.*)" 'To Remove all [[ Trade '4379160'\ ]] prefix from audit lines
regEx.Global = True
validTxt = regEx.Replace(validTxt, "$2") 'Text would be ==> Aggregate 0 Create aggregate
regEx.Pattern = "[(\t.*)](\t.*)" 'Pick only first 2 data points ==> Aggregate 0
regEx.Global = True
validTxt1 = regEx.Replace(validTxt, vbCr)
arrValidTxt = Split(validTxt1, vbCrLf) 'To Remove the first 2 header lines, split it based on new line
Set objFile = objFSO.OpenTextFile(fileInput, 2)
For i = 2 To (Ubound(arrValidTxt) - 1) 'Ignore first 2 header lines
objFile.WriteLine arrValidTxt(i)
Next
objFile.Close
Set objFile = Nothing
Set regEx = Nothing
Set objFSO = Nothing
End sub
Call SanitiseTradeAudit("C:\Users\pankaj.jaju\Desktop\ActualAuditMessage.txt")
我的问题是 - 这个正则表达式替换可以用一种模式完成吗?
答案 0 :(得分:1)
如果您逐行处理文件,这样的模式应该有效:
^.*?\\([^\t]*)\t([^\t]*)
以上匹配所有内容,直到第一个反斜杠(非贪婪匹配),然后是由单个选项卡分隔的两组零或多个非制表符(贪婪匹配)。
示例代码:
Set re = New RegExp
re.Pattern = "^.*?\\([^\t]*)\t([^\t]*)"
txt = objFSO.OpenTextFile(fileInput).ReadAll
Set objFile = objFSO.OpenTextFile(fileInput)
For Each line In Split(txt, vbNewLine)
For Each m In re.Execute(line)
objFile.WriteLine m.SubMatches(0) & vbTab & m.SubMatches(1)
Next
Next
objFile.Close
如果你需要处理大文件,我会完全删除ReadAll
并逐行读取输入文件以避免内存耗尽:
Set re = New RegExp
re.Pattern = "^.*?\\([^\t]*)\t([^\t]*)"
Set inFile = objFSO.OpenTextFile(fileInput)
Set outFile = objFSO.OpenTextFile(fileOutput, 2, True)
Do Until inFile.AtEndOfStream
line = inFile.ReadLine
For Each m In re.Execute(line)
objFile.WriteLine m.SubMatches(0) & vbTab & m.SubMatches(1)
Next
Loop
inFile.Close
outFile.Close