使用Powershell从大文件中提取文本

时间:2018-11-29 15:05:39

标签: powershell

我们有一个应用程序,该应用程序生成许多大的日志文件,我想使用PowerShell对其进行解析,并以CSV或带定界符'|'的文本获取输出。我尝试使用选择字符串,但无法获得预期的结果。下面我发布了日志格式和预期结果

日志文件数据:

如何使用PowerShell实现上述结果?

谢谢

2 个答案:

答案 0 :(得分:2)

就像我在评论中提到的那样,您需要分离记录,并尝试将数据与复杂的正则表达式匹配。

regex101上实时查看RegEx 研究该链接右上角每个元素的说明。

此脚本:

## Q:\Test\2018\11\29\SO_53541952.ps1

$LogFile = '.\SO_53541952.log'
$CsvFile = '.\SO_53541952.csv'
$ExcelFile='.\SO_53541952.xlsx'

## see the regex live <https://regex101.com/r/1TWm7i/1>
$RE = [RegEx]"(?sm)^Submitter Id +=> (?<SubmitterID>.*?$).*?^Start Time +=> (?<StartTime>[0-9:]{8}) +Start Date +=> (?<StartDate>[0-9\/]{10}).*?^Message Text +=> (?<MessageText>.*?$).*?^Src File +=> (?<SrcFile>.*?$).*?^Dest File +=> (?<DestFile>.*?$)"


$Data = (Get-Content $LogFile -raw) -split "(?sm)(?=^Record Id)" | ForEach-Object {
    If ($_ -match $RE){
        [PSCustomObject]@{
            'Submitter Id' = $Matches.SubmitterId
            'Start Time'   = $Matches.StartTime
            'Start Date'   = $Matches.StartDate
            'Message Text' = $Matches.MessageText
            'Src File'     = $Matches.SrcFile
            'Dest File'    = $Matches.DestFile
        }
    }
}
$Data | Format-Table -Auto
$Data | Export-Csv $CsvFile  -NoTypeInformation -Delimiter '|'

#$Data | Out-Gridview
## with the ImportExcel module you can directly generate an excel file
$Data | Export-Excel $ExcelFile -AutoSize # -Show

在屏幕上有此样本输出(我将样本修改为可区分的):

> .\SO_53541952.ps1

Submitter Id Start Time Start Date Message Text           Src File Dest File
------------ ---------- ---------- ------------           -------- ---------
STMDA@432... 00:02:51   11/29/2018 Copy step successfu... File1... c\temp...
STMDA@432... 00:02:52   11/29/2018 Copy step successfu... File2... c\temp...
STMDA@432... 00:02:53   11/29/2018 Copy step successfu... File3... c\temp...
STMDA@432... 00:02:54   11/29/2018 Copy step successfu... File4... c\temp...

并安装了Doug Finke's ImportExcel module,您将直接获得一个.xlsx文件:

enter image description here

答案 1 :(得分:1)

按照LotPings的建议,您需要将日志文件的内容分成单独的块。 然后,使用正则表达式可以捕获所需的值并将其存储在对象中,然后可以将其导出到CSV文件。

类似这样的东西:

$log = @"
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4322
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File2
Dest File         => c\temp2
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4321
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File1
Dest File         => c\temp1
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4323
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File3
Dest File         => c\temp3
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
Record Id         => STM
Process Name      => STMDA         Stat Log Time  => 00:02:59
Process Number    => 51657           Stat Log Date  => 11/29/2018
Submitter Id      => STMDA@4324
SNode User Id     => de34fc5

Start Time        => 00:02:59        Start Date     => 11/29/2018
Stop Time         => 00:02:59        Stop Date      => 11/29/2018

SNODE             => dfdvrvbsdfgg         
Completion Code   => 0 
Message Id        => ncpa
Message Text      => Copy step successful.
Ckpt=> Y Lkfl=> N Rstr=> N XLat=> Y 
FASP=> N
From Node         => P
Src File          => File4
Dest File         => c\temp4
Src CCode         => 0              Dest CCode       => 0       
Src Msgid         => ncpa       Dest Msgid       => ncpa
Bytes Read        => 4000           Bytes Written    => 4010    
Records Read      => 5              Records Written  => 5       
Bytes Sent        => 4010           Bytes Received   => 4010    
RUs Sent          => 0              RUs Received     => 1       
------------------------------------------------------------------------------
"@

# first break the log into 'Record Id' blocks
$blocks = @()
$regex = [regex] '(?m)(Record Id[^-]+)'
$match = $regex.Match($log)
while ($match.Success) {
    $blocks += $match.Value
    $match = $match.NextMatch()
} 

# next, parse out the required values for each block and create objects to export
$blocks | ForEach-Object {
    if ($_ -match '(?s)Submitter Id\s+=>\s+(?<submitter>[^\s]+).+Start Time\s+=>\s+(?<starttime>[^\s]+)\s+Start Date\s+=>\s+(?<startdate>[^\s]+).+Message Text\s+=>\s+(?<messagetext>[\w ,.;-_]+).+Src File\s+=>\s+(?<sourcefile>[\w ,.;-_]+).+Dest File\s+=>\s+(?<destinationfile>[\w ,.;-_]+)') {
        [PSCustomObject]@{
            'Submitter Id' = $matches['submitter']
            'Start Time'   = $matches['starttime']
            'Start Date'   = $matches['startdate']
            'Message Text' = $matches['messagetext']
            'Src File'     = $matches['sourcefile']
            'Dest File'    = $matches['destinationfile']
        }
    }
} | Export-Csv -Path '<PATH_TO_YOUR_OUTPUT_CSV>' -Delimiter '|' -NoTypeInformation

这将导致一个具有以下内容的csv文件:

"Submitter Id"|"Start Time"|"Start Date"|"Message Text"|"Src File"|"Dest File"
"STMDA@4322"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File2"|"c\temp2"
"STMDA@4321"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File1"|"c\temp1"
"STMDA@4323"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File3"|"c\temp3"
"STMDA@4324"|"00:02:59"|"11/29/2018"|"Copy step successful."|"File4"|"c\temp4"