需要帮助来解析文本文件

时间:2016-09-05 19:27:52

标签: powershell

我需要帮助理解逻辑如何解析当前格式不正确的文本文件,因为它很难读取日志内容。文本输入文件如下所示:

========== Test1 (1) ========== Id UTC Date/Time Message 4d1eb19c-5420-4bb2-9e21-65880eb90429 08-30T01:26:24Z Messagel Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3,4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message2 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.,4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message2 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.

========== Test2 (1) ========== Id UTC Date/Time Message 4d1eb19c-5420-4bb2-9e21-65880eb90429 08-30T01:26:24Z Message2 Name='xyz', Connection='Usb', Fleet Report Id='ca9d09e7-1564-4066-8f5e-6a123456', Fleet Proxy Id ='0fsfsda7-c7e8-4bb2-9dd4-2f4c3b2498a3,4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message2 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.,4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message2 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.

有多个部分{Test1 test2 ... n},每个部分包含多个Id utc日期时间和消息,所有部分也以

开头,以

<结尾/ p>

如何以表格格式排列它们?需要以表格格式将输出格式化如下:

ID UTC Date/Time Message

========== Test1 (1) ==========

Id                                    UTC Date/Time    Message 
4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Messagel Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.

4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message2 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.

4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message3 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.

========== Test2 (1) ========== 
Id                                    UTC Date/Time   Message 
4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z Message1 Name='xyz', Connection='Usb', Fleet Report Id='ca9d09e7-1564-4066-8f5e-6a123456', Fleet Proxy Id ='0fsfsda7-c7e8-4bb2-9dd4-2f4c3b2498a3,

4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message2 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3,

4d1eb19c-5420-4bb2-9e21-65880eb90429  08-30T01:26:24Z  Message3 Name='abz', Connection='Usb', Fleet Report Id='ca9d3457-1564-4066-8f5e-12345678', Fleet Proxy Id ='ghjfda7-c7e8-4bb2-9dd4-2f4c3b2498a3.

这是我尝试过的,但它没有解析文本文件中的所有内容。

$file = Get-Content -path .\ViewSource.txt | Where-Object {
  $_ -ne ""
} | ForEach-Object {
  $_ -replace '<[^>]+>', ''
}
foreach ($line in $file) {
  $elements = $line.Split(" ", [StringSplitOptions]::RemoveEmptyEntries)
  [PSCustomObject]@{
    Id          = $elements[8]
    UtcDateTime = $elements[9]
    Message     = $elements[10..19] -join " "
  }
}

1 个答案:

答案 0 :(得分:0)

由于您的ID和时间戳字段具有固定宽度,并且每行似乎没有多条消息,因此最简单的方法可能是使用格式正确/包装的标题行替换“内联”标题:< / p>

$inline  = ' Id UTC Date/Time Message '
$wrapped = "`nId                                   UTC Date/Time   Message`n"
(Get-Content -Path 'C:\path\to\input.txt') -replace $inline, $wrapped |
    Set-Content -Path 'C:\path\to\output.txt'

编辑:如果每行有多封邮件,则还需要匹配每封邮件之前的GUID和时间戳序列,并在这些匹配之前插入换行符:

$inline  = ' Id UTC Date/Time Message '
$wrapped = "`nId                                   UTC Date/Time   Message"

$guid = '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}'
$ts   = '\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z'

(Get-Content 'C:\path\to\input.txt') -replace $inline, $wrapped -replace "($guid) +($ts) +", "`n`$1 `$2 " |
    Set-Content -Path 'C:\path\to\output.txt'