Windows PowerShell:如何解析日志文件?

时间:2020-09-04 05:28:09

标签: powershell

我有一个包含以下内容的输入文件:

27/08/2020  02:47:37.365 (-0516)  hostname12    ult_licesrv       ULT  5  LiceSrv Main[108                    00000  Session 'session1' (from 'vmpms1\app1@pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT' - 1 licenses have been allocated by concurrent usage category 'Unlimited' (session module usage now 1, session category usage now 1, total module concurrent usage now 1, total category usage now 1)
27/08/2020  02:47:37.600 (-0516)  hostname13    ult_licesrv       ULT  5  LiceSrv Main[108                    00000  Session 'sssion2' (from 'vmpms2\app1@pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT-Read' - 1 licenses have been allocated by concurrent usage category 'Floating' (session module usage now 2, session category usage now 2, total module concurrent usage now 1, total category usage now 1)
27/08/2020  02:47:37.115 (-0516)  hostname141    ult_licesrv       CMN  5  Logging Housekee                    00000  Deleting old log file 'C:\Program Files\PMCOM Global\License Server\diag_ult_licesrv_20200824_011130.log.gz' as it exceeds the purge threashold of 72 hours
27/08/2020  02:47:37.115 (-0516)  hostname141    ult_licesrv       CMN  5  Logging Housekee                    00000  Deleting old log file 'C:\Program Files\PMCOM Global\License Server\diag_ult_licesrv_20200824_021310.log.gz' as it exceeds the purge threashold of 72 hours
27/08/2020  02:47:37.625 (-0516)  hostname150    ult_licesrv       ULT  5  LiceSrv Main[108                    00000  Session 'session1' (from 'vmpms1\app1@pmc21app20.pm.com') request for 1 additional licenses for module 'SA-XT' - 1 licenses have been allocated by concurrent usage category 'Unlimited' (session module usage now 2, session category usage now 1, total module concurrent usage now 2, total category usage now 1)

我需要生成和输出如下文件:

Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage
27/08/2020,02:47:37.365 (-0516),hostname12,1,1,1,1
27/08/2020,02:47:37.600 (-0516),hostname13,2,2,1,1
27/08/2020,02:47:37.115 (-0516),hostname141,0,0,0,0
27/08/2020,02:47:37.115 (-0516),hostname141,0,0,0,0
27/08/2020,02:47:37.625 (-0516),hostname150,2,1,2,1

输出数据顺序为:日期,时间,主机名,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage。

如果没有session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage

的条目,则输入 0,0,0,0

我需要从输入文件中获取内容并将输出写入另一个文件。

更新

我在F驱动器中创建了一个文件input.txt,并将日志详细信息粘贴到其中。 然后,当出现如下所示的新行时,通过拆分文件内容来形成一个数组。

$myList = (Get-Content -Path F:\input.txt) -split '\n'

现在我在数组myList中得到了5个项目。然后,我用单个空格替换多个空格,并通过用空格分隔每个元素来形成一个新的数组。然后我打印0到3个数组元素。现在,我需要添加最终值(session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage)。

PS C:\Users\user> $myList = (Get-Content -Path F:\input.txt) -split '\n'
PS C:\Users\user> $myList.Length
5
    PS C:\Users\user> $myList = (Get-Content -Path F:\input.txt) -split '\n'
PS C:\Users\user> $myList.Length
5
PS C:\Users\user> for ($i = 0; $i -le ($myList.length - 1); $i += 1) {
>> $newList = ($myList[$i] -replace '\s+', ' ') -split ' '
>> $newList[0]+','+$newList[1]+' '+$newList[2]+','+$newList[3]
>>  }
27/08/2020,02:47:37.365 (-0516),hostname12
27/08/2020,02:47:37.600 (-0516),hostname13
27/08/2020,02:47:37.115 (-0516),hostname141
27/08/2020,02:47:37.115 (-0516),hostname141
27/08/2020,02:47:37.625 (-0516),hostname150

2 个答案:

答案 0 :(得分:2)

如果您确实需要根据所需的粒度进行过滤,则可能需要使用正则表达式来过滤行。

这将假定行在要查找的值之前具有类似标记的行,因此请记住这一点。

[System.Collections.ArrayList]$filteredRows = @()
$log = Get-Content -Path C:\logfile.log
foreach ($row in $log) {
    $rowIndex = $log.IndexOf($row)
    $date = ([regex]::Match($log[$rowIndex],'^\d+\/\d+\/\d+')).value
    $time = ([regex]::Match($log[$rowIndex],'\d+:\d+:\d+\.\d+\s\(\S+\)')).value
    $hostname = ([regex]::Match($log[$rowIndex],'(?<=\d\d\d\d\)  )\w+')).value
    $sessionModuleUsage = ([regex]::Match($log[$rowIndex],'(?<=session module usage now )\d')).value
    if (!$sessionModuleUsage) {
        $sessionModuleUsage = 0
    }
    $sessionCategoryUsage = ([regex]::Match($log[$rowIndex],'(?<=session category usage now )\d')).value
    if (!$sessionCategoryUsage) {
        $sessionCategoryUsage = 0
    }
    $moduleConcurrentUsage = ([regex]::Match($log[$rowIndex],'(?<=total module concurrent usage now )\d')).value
    if (!$moduleConcurrentUsage) {
        $moduleConcurrentUsage = 0
    }
    $totalCategoryUsage = ([regex]::Match($log[$rowIndex],'(?<=total category usage now )\d')).value
    if (!$totalCategoryUsage) {
        $totalCategoryUsage = 0
    }
    $hash = [ordered]@{
        Date = $date
        time = $time
        hostname = $hostname
        session_module_usage = $sessionModuleUsage
        session_category_usage = $sessionCategoryUsage
        module_concurrent_usage = $moduleConcurrentUsage
        total_category_usage = $totalCategoryUsage
    }
    $rowData = New-Object -TypeName 'psobject' -Property $hash
    $filteredRows.Add($rowData) > $null
}
$csv = $filteredRows | convertto-csv -NoTypeInformation -Delimiter "," | foreach {$_ -replace '"',''}
$csv | Out-File C:\results.csv

本质上需要发生的是,我们需要get-content日志,该日志返回一个数组,其中每个项目都在换行符处终止。

一旦有了行,我们需要通过正则表达式获取值 因为如果某些值不存在,您希望某些项目中为零,所以我有if语句,如果正则表达式不返回任何值,则赋值为“ 0”

最后,我们将每个过滤的项目添加到PSObject中,并在每次迭代中将该对象附加到对象数组中。

然后导出为CSV。

答案 1 :(得分:2)

您可能很容易用正则表达式和子字符串来分隔行。基本上类似于以下内容:

# Iterate over the lines of the input file
Get-Content F:\input.txt |
    ForEach-Object {
      # Extract the individual fields
      $Date = $_.Substring(0, 10)
      $Time = $_.Substring(12, $_.IndexOf(')') - 11)
      $Hostname = $_.Substring(34, $_.IndexOf(' ', 34) - 34)
      $session_module_usage = 0
      $session_category_usage  = 0
      $module_concurrent_usage = 0
      $total_category_usage = 0
      if ($_ -match 'session module usage now (\d+), session category usage now (\d+), total module concurrent usage now (\d+), total category usage now (\d+)') {
          $session_module_usage = $Matches[1]
          $session_category_usage  = $Matches[2]
          $module_concurrent_usage = $Matches[3]
          $total_category_usage = $Matches[4]
      }
      # Create custom object with those properties
      New-Object PSObject -Property @{
          Date = $Date
          time = $Time
          hostname = $Hostname
          session_module_usage = $session_module_usage
          session_category_usage = $session_category_usage
          module_concurrent_usage = $module_concurrent_usage
          total_category_usage = $total_category_usage
      }
    } |
    # Ensure column order in output
    Select-Object Date,time,hostname,session_module_usage,session_category_usage,module_concurrent_usage,total_category_usage |
    # Write as CSV - without quotes
    ConvertTo-Csv -NoTypeInformation |
    ForEach-Object { $_ -replace '"' } |
    Out-File F:\output.csv

是否从带有子字符串或正则表达式的行中提取日期,时间和主机名可能是一个问题。同样必须严格匹配格式,但对我而言,这主要取决于格式的严格程度。对于更多自由格式的东西,其中不同的行将匹配不同的正则表达式,或者多行组成一条记录,我也很喜欢switch -Regex遍历这些行。