我正在尝试使用AutoIt检查文本文件并将选择行输出为CSV。我一直遇到的问题是它需要永远。当前方法一次检查一行。它可以燃烧每秒5-10行,但我在AutoIt框架内寻找更快的东西。
代码:
#include <File.au3>
$xnConfirm = False
$xnConfirmMsg = 0
while $xnConfirm = False
$xnFile = FileOpenDialog("File to Examine...","%userprofile%","All (*.*)") ;InputBox("File???", "Which file do you want to review?","C:\")
If FileExists($xnFile) = True Then
$xnConfirm = True
Else
$xnConfirmMsg = msgbox(1,"File Not Found...",$xnFile & " does not exist." & @crlf & "Please select another file.")
EndIf
WEnd
$xnConfirm = False
$xnConfirmMsg = 0
while $xnConfirm = False
$xnTargetFile = FileOpenDialog("Location to Save to...",$xnFile & " - output.csv","All (*.*)");"%userprofile%\Documents\output.csv"
;FileSaveDialog("Location to Save to...","%userprofile%","All (*.*)",16,"output - " & $xnFile & " - output.csv") ;
Consolewrite("Outputting to " & $xnTargetFile & @crlf)
if fileexists($xnTargetFile) then
$xnConfirmMsg = msgbox(4,"Overwrite?","Are you sure you want to overwrite " & @crlf & $xnTargetFile)
if $xnConfirmMsg = 6 Then
$xnConfirm = True
filedelete($xnTargetFile)
EndIf
Else
$xnConfirm = True
EndIf
WEnd
progresson("Line count","Verifying the number of lines in " & $xnFile)
$xnFileLine = _FileCountLines($xnFile) ;InputBox("Number of lines","How many lines are in this document?",10000)
consolewrite("Loading "& $xnFile & " with " & $xnFileLine & " total lines." & @crlf)
progressoff()
local $hfl = FileOpen($xnFile,0)
FileWrite($xnTargetFile,"")
FileOpen($xnTargetFile, 1)
$i = 1
ProgressOn("Creating CSV","Extracting matching data.","",0,0,16)
$xnTargetLine = 1
FileWriteLine($xnTargetFile,"Timestamp,Message,Category,Priority,EventId,Severity,Title,Machine,App Domain,ProcessID,Process Name,Thread Name,Win32 ThreadId")
While $i < $xnFileLine
;$xnCurrentLine = FileReadLine($xnFile,$i) ;Old Settings
$xnCurrentLine = FileReadLine($hfl,$i)
;MsgBox(1,"",$xnCurrentLine)
Select
Case stringinstr($xnCurrentLine,"Timestamp:")
$xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & ","
Case stringinstr($xnCurrentLine,"Message:")
$xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & ","
Case stringinstr($xnCurrentLine,"Category:")
$xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & ","
Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
$xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf
FileWriteLine($xnTargetFile,$xnTargetLine)
case Else
consolewrite("Nothing on line " & $i & @crlf)
EndSelect
$i = $i + 1
ProgressSet(round($i/$xnFileLine * 100,1),$i & " of " & $xnFileLine & " lines examined." & @cr & "Thank you for your patience.")
WEnd
ProgressOff()
为了解决这个问题,我正在读取类似于跟踪日志的日志文件。我希望事件输出到CSV,以便我可以检查趋势。日志文件中的格式如下所示:
Timestamp: 9/26/2013 3:33:23 AM
Message: Log Event Received
Category: Transaction
Win32 ThreadId:2872
我知道这是代码格式,但我希望它更容易阅读。
答案 0 :(得分:2)
我不确定它是否真的更快,但你可以使用Regexp。 如果你能告诉我一些规则是什么:
Case stringinstr($xnCurrentLine,"Timestamp:")
$xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & ","
Case stringinstr($xnCurrentLine,"Message:")
$xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & ","
Case stringinstr($xnCurrentLine,"Category:")
$xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & ","
Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
$xnTargetLine = $xnTargetLine & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf
FileWriteLine($xnTargetFile,$xnTargetLine)
case Else
consolewrite("Nothing on line " & $i & @crlf)
如果你可以给我2或3个示例行,我可以试着让你成为一个Regexp函数,我认为它会更快。
编辑:
我做了一个示例脚本。 如果输入文件看起来像这样:
Timestamp: 9/26/2013 3:33:23 AM
Message: Log Event Received
Category: Transaction
Win32 ThreadId:2872
然后这个脚本工作得很好
#include <Array.au3>
Local $file = FileOpen("InputFile.txt", 0)
$sText = FileRead($file)
$aSnippets = StringRegExp($sText,"(?:Timestamp:|Message:|Category:|Win32 ThreadId:)(?: )?(.+)",3)
_ArrayDisplay($aSnippets)
结果是一个包含以下内容的数组:
[0] = 9/26/2013 3:33:23 AM
[1] = Log Event Received
[2] = Transaction
[3] = 2872
etc.
如果你想将这4行合并为一个,尝试使用for循环(如果你愿意,我可以让你一个)
对于 100行,他需要 0.490570878768441毫秒将每个值存储在一个数组中。
答案 1 :(得分:2)
(我想添加一条评论,要求对正在读取的数据进行抽样,但是我还没有足够的数据......)
根据输入文件的大小,我建议使用_FileReadToArray()一次性将整个文件读入数组,然后在内存中循环访问数组(而不是在整个过程中保持对文件的访问)。另外,我不会每次都写入输出文件 - 我写了一个字符串,然后在完成后保存字符串。
类似的东西:
$outputFileData = ""
$inputFileData = _FileReadToArray($xnFile)
For $Counter = 1 to $inputFileData[0]
$tmpLine = $inputFileData[$Counter]
Select
Case stringinstr($tmpLine,"Timestamp:")
$outputFileData = stringmid($tmpLine,12,stringlen($tmpLine) - 12 + 1) & ","
Case stringinstr($tmpLine,"Message:")
$outputFileData &= stringmid($tmpLine,10,stringlen($tmpLine) - 10 + 1) & ","
Case stringinstr($xnCurrentLine,"Category:")
$outputFileData &= stringmid($tmpLine,11,stringlen($tmpLine) - 11 + 1) & ","
Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
$outputFileData &= stringmid($tmpLine,16,stringlen($tmpLine) - 16 + 1) & @CRLF
case Else
ConsoleWrite("Nothing on line " & $i & @crlf)
EndSelect
Next
FileWriteLine($xnTargetFile, $outputFileData)
(请注意我没有包含任何错误检查,也没有检查错误:)
答案 2 :(得分:0)
还有另一种可能的想法。
您可以复制输入文件,重命名它,然后从中删除每个用户数据。 使用RegularExpressions非常容易,甚至可能更快。
如果您向我展示输入文件的示例以及输出文件应如何显示,我可以尝试:)