更快地检查AutoIt中的文本?

时间:2013-09-26 16:57:33

标签: text-files autoit export-to-csv

我正在尝试使用AutoIt检查文本文件并将选择行输出为CSV。我一直遇到的问题是它需要永远。当前方法一次检查一行。它可以燃烧每秒5-10行,但我在AutoIt框架内寻找更快的东西。

代码:

#include <File.au3>
$xnConfirm = False
$xnConfirmMsg = 0
while $xnConfirm = False

      $xnFile = FileOpenDialog("File to Examine...","%userprofile%","All (*.*)") ;InputBox("File???", "Which file do you want to review?","C:\")
   If FileExists($xnFile) = True Then
            $xnConfirm = True
         Else
               $xnConfirmMsg = msgbox(1,"File Not Found...",$xnFile & " does not exist." & @crlf &  "Please select another file.")
      EndIf
WEnd

$xnConfirm = False
$xnConfirmMsg = 0
while $xnConfirm = False
   $xnTargetFile = FileOpenDialog("Location to Save to...",$xnFile & " - output.csv","All (*.*)");"%userprofile%\Documents\output.csv" 
                  ;FileSaveDialog("Location to Save to...","%userprofile%","All (*.*)",16,"output - " & $xnFile & " - output.csv") ;
         Consolewrite("Outputting to " & $xnTargetFile & @crlf)

      if fileexists($xnTargetFile) then
            $xnConfirmMsg = msgbox(4,"Overwrite?","Are you sure you want to overwrite " & @crlf & $xnTargetFile)

               if $xnConfirmMsg = 6 Then
                  $xnConfirm = True
                  filedelete($xnTargetFile)               
               EndIf
            Else   

               $xnConfirm = True

      EndIf    
WEnd

progresson("Line count","Verifying the number of lines in " & $xnFile)
$xnFileLine = _FileCountLines($xnFile) ;InputBox("Number of lines","How many lines are in this document?",10000)
consolewrite("Loading "& $xnFile & " with " & $xnFileLine & " total lines." &  @crlf)
progressoff()

local $hfl = FileOpen($xnFile,0)
FileWrite($xnTargetFile,"")
FileOpen($xnTargetFile, 1)

$i = 1

ProgressOn("Creating CSV","Extracting matching data.","",0,0,16)
$xnTargetLine = 1

FileWriteLine($xnTargetFile,"Timestamp,Message,Category,Priority,EventId,Severity,Title,Machine,App Domain,ProcessID,Process Name,Thread Name,Win32 ThreadId")

While $i < $xnFileLine

                  ;$xnCurrentLine = FileReadLine($xnFile,$i) ;Old Settings
            $xnCurrentLine = FileReadLine($hfl,$i)
            ;MsgBox(1,"",$xnCurrentLine)

      Select
         Case stringinstr($xnCurrentLine,"Timestamp:")
            $xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & "," 
         Case stringinstr($xnCurrentLine,"Message:")
            $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & ","
         Case stringinstr($xnCurrentLine,"Category:")
            $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & ","
         Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
            $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf
               FileWriteLine($xnTargetFile,$xnTargetLine)
         case Else
               consolewrite("Nothing on line " & $i & @crlf)
      EndSelect
         $i = $i + 1
                     ProgressSet(round($i/$xnFileLine * 100,1),$i & " of " & $xnFileLine & " lines examined." & @cr & "Thank you for your patience.")
   WEnd
ProgressOff()

为了解决这个问题,我正在读取类似于跟踪日志的日志文件。我希望事件输出到CSV,以便我可以检查趋势。日志文件中的格式如下所示:

Timestamp: 9/26/2013 3:33:23 AM

Message: Log Event Received

Category: Transaction

Win32 ThreadId:2872

我知道这是代码格式,但我希望它更容易阅读。

3 个答案:

答案 0 :(得分:2)

我不确定它是否真的更快,但你可以使用Regexp。 如果你能告诉我一些规则是什么:

         Case stringinstr($xnCurrentLine,"Timestamp:")
        $xnTargetLine = stringmid($xnCurrentLine,12,stringlen($xnCurrentLine) - 12 + 1) & "," 
     Case stringinstr($xnCurrentLine,"Message:")
        $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,10,stringlen($xnCurrentLine) - 10 + 1) & ","
     Case stringinstr($xnCurrentLine,"Category:")
        $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,11,stringlen($xnCurrentLine) - 11 + 1) & ","
     Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
        $xnTargetLine = $xnTargetLine  & stringmid($xnCurrentLine,16,stringlen($xnCurrentLine) - 16 + 1) & @crlf
           FileWriteLine($xnTargetFile,$xnTargetLine)
     case Else
           consolewrite("Nothing on line " & $i & @crlf)

如果你可以给我2或3个示例行,我可以试着让你成为一个Regexp函数,我认为它会更快。

编辑:

我做了一个示例脚本。 如果输入文件看起来像这样:

Timestamp: 9/26/2013 3:33:23 AM
Message: Log Event Received
Category: Transaction
Win32 ThreadId:2872

然后这个脚本工作得很好

#include <Array.au3>
Local $file = FileOpen("InputFile.txt", 0)
$sText = FileRead($file)
$aSnippets = StringRegExp($sText,"(?:Timestamp:|Message:|Category:|Win32 ThreadId:)(?: )?(.+)",3)
_ArrayDisplay($aSnippets)

结果是一个包含以下内容的数组:

[0] = 9/26/2013 3:33:23 AM
[1] = Log Event Received
[2] = Transaction
[3] = 2872
etc.

如果你想将这4行合并为一个,尝试使用for循环(如果你愿意,我可以让你一个)

对于 100行,他需要 0.490570878768441毫秒将每个值存储在一个数组中。

答案 1 :(得分:2)

(我想添加一条评论,要求对正在读取的数据进行抽样,但是我还没有足够的数据......)

根据输入文件的大小,我建议使用_FileReadToArray()一次性将整个文件读入数组,然后在内存中循环访问数组(而不是在整个过程中保持对文件的访问)。另外,我不会每次都写入输出文件 - 我写了一个字符串,然后在完成后保存字符串。

类似的东西:

$outputFileData = ""
$inputFileData = _FileReadToArray($xnFile)

For $Counter = 1 to $inputFileData[0]

      $tmpLine = $inputFileData[$Counter]

      Select

         Case stringinstr($tmpLine,"Timestamp:")
            $outputFileData = stringmid($tmpLine,12,stringlen($tmpLine) - 12 + 1) & "," 

         Case stringinstr($tmpLine,"Message:")
            $outputFileData &= stringmid($tmpLine,10,stringlen($tmpLine) - 10 + 1) & ","

         Case stringinstr($xnCurrentLine,"Category:")
            $outputFileData &= stringmid($tmpLine,11,stringlen($tmpLine) - 11 + 1) & ","

         Case stringinstr($xnCurrentLine,"Win32 ThreadId:")
            $outputFileData &= stringmid($tmpLine,16,stringlen($tmpLine) - 16 + 1) & @CRLF

         case Else
              ConsoleWrite("Nothing on line " & $i & @crlf)

      EndSelect

Next

FileWriteLine($xnTargetFile, $outputFileData)

(请注意我没有包含任何错误检查,也没有检查错误:)

答案 2 :(得分:0)

还有另一种可能的想法。

您可以复制输入文件,重命名它,然后从中删除每个用户数据。 使用RegularExpressions非常容易,甚至可能更快。

如果您向我展示输入文件的示例以及输出文件应如何显示,我可以尝试:)