使用选择字符串匹配多个单行模式并写入输出

时间:2019-02-01 18:51:19

标签: regex powershell select-string

我正在尝试构建一个简单的脚本来利用正则表达式并在一行上匹配多个模式-在整个输入文件中递归地将结果写入输出文件。但是我撞墙了:

示例文本:

BMC12345 COMBINED PHASE STATISTICS:  31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345S', 0 ROWS SELECTED BUT DISCARDED DUE TBMC123456 COMBINED PHASE STATISTICS:  10 PHYSICAL (10 LOGICAL) RECORDS DISCARDED TO SYSDISC

这是到目前为止我得到的:

$table = [regex] "'.*'"
$discard = [regex] "\d* PHYSICAL"

Select-String -Pattern ($table, $discard) -AllMatches .\test.txt | foreach {
    $_.Matches.Value
} > output.txt

输出:

'KDDT111D.DIH0345S'

所需的输出:

'KDDT111D.DIH0345S' 10 Physical

由于某种原因,我无法使两种模式都写入output.txt。 理想情况下,一旦我开始工作,我想使用Export-Csv来获得一些更清洁的东西,例如:

|KDDT111D|DIH0345S|10 Physical|

3 个答案:

答案 0 :(得分:1)

我想你会发现-match运营商更多的适合于这一点。 [咧嘴]使用命名匹配与$InStuff中存储的样本进行匹配,这...

$InStuff -match ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) \(.+"

...给出以下匹配项...

Name                           Value                                                                              
----                           -----                                                                              
Space                          KDDT111D                                                                           
SubSpace                       DIH0345S                                                                           
Discarded                      10 PHYSICAL                                                                        
0                              BMC12345 COMBINED PHASE STATISTICS: 31 ROWS SELECTED FOR SPACE 'KDDT111D.DIH0345...

命名的匹配项可以用$Matches.<the capture group name>解决。

答案 1 :(得分:1)

您遇到了 Select-String限制.Matches为每个输入对象(行)发出的[Microsoft.PowerShell.Commands.MatchInfo]对象的Select-String属性)仅包含传递给
-Pattern参数的 first 正则表达式的(可能有多个)匹配项。 [1]

您可以通过传递单个正则表达式来解决问题,或者通过 alternation ({{1} ):

|

一个简化的示例:

Select-String -Pattern ($table, $discard -join '|') -AllMatches .\test.txt | 
  ForEach-Object { $_.Matches.Value } > output.txt

以上结果:

# ('f.', '.z' -join '|') -> 'f.|.z'
'foo bar baz' | Select-String -AllMatches ('f.', '.z' -join '|') |
  ForEach-Object { $_.Matches.Value }

证明已报告了两个正则表达式的匹配项。

注意事项重新进行输出排序:使用交替(fo az )会导致给定输入字符串的匹配项按照其排列顺序进行报告在输入中找到 ,而不是按照指定 regexes的顺序
也就是说,上面的|-Pattern 'f.|.z'都将产生相同的输出顺序。


[1]该问题从Windows PowerShell v5.1 / PowerShell Core 6.2.0-preview.4开始存在,并在this GitHub issue

中进行了讨论。

答案 2 :(得分:0)

感谢贡献者的想法和学习经验。通过结合接收两个答案,我能够获得所需的输出。

我发现-match运算符仅从源文件返回了第一次出现的正则表达式模式匹配项,因此我需要添加一个foreach循环以便在整个日志文件中递归返回匹配项

我还修改了正则表达式,使其仅包含大于0的丢弃值。

示例文本:

BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDAICH0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  3499604 ROWS SELECTED FOR SPACE 'KDDT000D.KDAIND0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1 ROWS SELECTED FOR SPACE 'KDDT000D.KDCISR0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  9185775 ROWS SELECTED FOR SPACE 'KDDT000D.KDIADR0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  11 PHYSICAL (11 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDICHT0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  2387375 ROWS SELECTED FOR SPACE 'KDDT000D.KDICMS0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  1632821 ROWS SELECTED FOR SPACE 'KDDT000D.KDIPRV0S', 0 ROWS SELECTED BUT DISCARDED BMC51479I COMBINED PHASE STATISTICS:  0 PHYSICAL (0 LOGICAL) RECORDS DISCARDED TO SYSDISC
BMC51472I COMBINED PHASE STATISTICS:  0 ROWS SELECTED FOR SPACE 'KDDT000D.KDLADD0S', 0 ROWS SELECTED BUT DISCARDED DUE TOBMC51479I COMBINED PHASE STATISTICS:  24845 PHYSICAL (24845 LOGICAL) RECORDS DISCARDED TO SYSDISC

示例:

  $regex = ".+SPACE '(?<Space>.+)\.(?<SubSpace>.+)'.+: (?<Discarded>.+) .[1-9][0-9]*\s\b"

    $timestamp = Get-Date
    $timestamp = Get-Date $timestamp -f "MM_dd_yy"
    $dir = "C:\Users\JonMonJovi\"

    cat $dir\*.log.txt | where {
        $_ -match $regex
    } | foreach {
        $Matches.Space, $Matches.SubSpace, $Matches.Discarded -join "|"
    } > C:\Users\JonMonJovi\Discarded\Discard_Log_$timestamp.txt

输出:

KDDT000D|KDIADR0S| 11 PHYSICAL
KDDT000D|KDLADD0S| 24845 PHYSICAL

从这里,我可以使用以竖线分隔的.txt输出文件导入Excel,满足我的要求。