我有一个这样的日志文件:
some strings...
<FX>
another strings...
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
some strings...
<FX>
<FX>
<TEG1>
</TEG1>
</FX>
我需要解析它并得到这个结果:
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
并且
<FX>
<TEG3>
</TEG3>
</FX>
我已经写过这样的正则表达式:
<FX>([\s\S]+?)</FX>
但它会返回此匹配:
<FX>
another strings...
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
和
<FX>
<FX>
<TEG1>
</TEG1>
</FX>
有人可以帮我正则表达吗? 谢谢你的支持。
答案 0 :(得分:0)
根据隐藏在“其他字符串”背后隐藏的内容,您可以逃避:
Dim sAll : sAll = goFS.OpenTextFile("..\data\15168620.txt").ReadAll()
WScript.Echo sAll
WScript.Echo "--------"
Dim reX : Set reX = New RegExp
reX.Global = True
reX.Pattern = "<FX>[\s\S]*?(<FX>[\s\S]+?</FX>)"
Dim oMTS : Set oMTS = reX.Execute(sAll)
Dim oMT
For Each oMT in oMTS
WScript.Echo oMT.SubMatches(0)
WScript.Echo "--------"
Next
输出:
some strings...
<FX>
another strings...
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
some strings...
<FX>
<FX>
<TEG1>
</TEG1>
</FX>
--------
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
--------
<FX>
<TEG1>
</TEG1>
</FX>
--------
<强>更新强>
我仍然希望我们能避免行人走路:
Dim sAll : sAll = goFS.OpenTextFile("..\data\15168620-2.txt").ReadAll()
WScript.Echo sAll
WScript.Echo "--------"
Dim aAll : aAll = Split(sAll, "FX>")
Dim sTry
For Each sTry In aAll
If "</" = Right(sTry, 2) Then
WScript.Echo "<FX>" & sTry & "FX>"
WScript.Echo "--------"
End If
Next
输出:
some strings...
<FX>
another <FX> strings...
<FX><FX><FX><FX><FX>
<FX>
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
some strings...
<FX>
<FX>
<TEG1>
</TEG1>
</FX>
--------
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
--------
<FX>
<TEG1>
</TEG1>
</FX>
--------
更新II:
行人方法 - 逐行阅读,在每个 <FX>
,</FX>
上的处理/输出集合上开始新的集合:
Dim alLines : Set alLines = CreateObject("System.Collections.ArrayList")
alLines.Capacity = 500
Dim oTS : Set oTS = goFS.OpenTextFile("..\data\15168620-2.txt")
Do Until oTS.AtEndOfStream
Dim sLine : sLine = oTS.Readline()
Select Case True
Case "<FX>" = Left(sLine, 4)
alLines.Clear
alLines.Add sLine
Case "</FX>" = Left(sLine, 5)
alLines.Add sLine
WScript.Echo Join(alLines.ToArray(), vbCrLf)
WScript.Echo "--------"
Case Else
alLines.Add sLine
End Select
Loop
oTS.Close
输出:
<FX>
<TEG1>
<TEG2>
</TEG2>
</TEG1>
</FX>
--------
<FX>
<TEG1>
</TEG1>
</FX>
--------
答案 1 :(得分:0)
如此庞大的文件(10 GB),RegExp
毫无价值。这是我的想法。
' StripInvalidXML.vbs
Option Explicit
Const ForReading = 1, ForWriting = 2, ForAppending = 8
Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
Const TAG_OPEN = "<FX>", TAG_CLOSE = "</FX>"
Dim fso, fin, fout
Dim sLine, sBlock
Set fso = CreateObject("Scripting.FileSystemObject")
Set fin = fso.OpenTextFile("input_log.xml", ForReading, False)
Set fout = fso.OpenTextFile("output_log.xml", ForAppending, True)
Do Until fin.AtEndOfStream
sLine = fin.ReadLine
If sLine = TAG_OPEN Then
sBlock = sLine
Else
sBlock = sBlock & sLine
End If
sBlock = sBlock & vbNewLine
If sLine = TAG_CLOSE Then
fout.WriteLine sBlock
End If
Loop
fin.Close
fout.Close