是否可以使用Select-String cmdlet替换Get-Content,ForEach-Object字符串匹配?

时间:2015-05-24 02:52:22

标签: regex powershell

我有一个固定宽度的文件,其记录格式如下

DDEDM2018890                                                                 19960730015000010000
DDETPL015000                                                                 20150515015005010000
DDETPL015010                                                                 20150515015003010000
DDETPL015020                                                                 20150515015002010000
DDETPL015030                                                                 20150515015005010000
DDETPL015040                                                                 20150515015000010000

前3个字符标识记录类型,在上面的示例中,所有记录都是DDE类型,但文件中也有不同类型的行。

以下带有命名捕获组的正则表达式会根据我的目的解析每条记录中的相关信息(请注意它还会过滤到DDE条记录类型:

DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})

this excellent online parser

上使用此正则表达式

我编写了一个脚本,该脚本使用Get-ContentForEach-ObjectSelect-Object cmdlet将固定宽度文件转换为csv文件。

我想知道是否可以用Get-Content个cmdlet替换ForEach-ObjectSelect-String cmdlet?

#this powershell script reads fixed width file and generates a csv file of the relevant & converted values

#Prepare HashSet object for Select-Object to convert CategoryCode and append with CategoryId
$Category = @{
    Name = "Category"
    Expression = {
        $cat = switch($_.CategoryCode) 
        {
            "50"{"A"}
            "54"{"C"}
            "60"{"F"}
            "66"{"I"}
            "74"{"M"}
            "88"{"T"}
        } 
        $cat+$_.CategoryId
    }
}

gc "C:\Path\To\File.txt" | % { 
        if($_ -match "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3}).*$")
        {
            #$matches is a hashset of named capture groups, convert to object to allow Select-Object to handle hashset elements as object properties
            [PSCustomObject]$matches
        }
    } | select Database, $Category, Length #| export-csv "AnalysisLengths.csv" -NoTypeInformation

在我最终确定脚本之前,我试图使用Select-String cmdlet但无法弄清楚如何使用它,我相信它可以以更加雄辩的方式实现相同的结果......这就是我有:

##Could this be completed with just the Select-String commandlet instead of Get-Content+ForEach+Select-Object?
Select-String -Path "C:\Path\To\File.txt" `
    -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" `
    | Select-Object -ExpandProperty Matches 

使用-ExpandProperty会将Microsoft.PowerShell.Commands.MatchInfo Matches属性转换为每行的实际System.Text.RegularExpressions.Match对象...

另见Powershell Select-Object vs ForEach on Select-String results

3 个答案:

答案 0 :(得分:2)

这是一种方式(我不是很自豪)

Select-String -Path "C:\Path\To\File.txt" -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" | %{New-Object -TypeName PSObject -Property @{Database=$_.matches.groups[1];CategoryCode=$_.matches.groups[2];CategoryId=$_.matches.groups[3];Length=$_.matches.groups[4]}} | export-csv "C:\Path\To\File.csv"

答案 1 :(得分:2)

我不知道为什么您将问题限制在Select-String cmdlet。如果您已包含switch声明,那么,我会回复您:是!这是可能的!

我将向您展示简单而简短的PowerShell代码

$(switch -Regex -File $fileIN{$patt{[pscustomobject]$matches|select * -ExcludeProperty 0}})|epcsv $fileCSV` 

其中$fileIN是输入文件,$fileCSV是您要创建的CSV文件,$patt是您在OP中的模式:

$patt='DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'`

switch语句非常强大。

答案 2 :(得分:1)

虽然Select-String可以合并Get-Content和模式匹配,但您仍需要一个循环来构建自定义对象。虽然我建议做一些修改,但你可以坚持使用你拥有的东西。将switch语句替换为哈希表,并使嵌套的if成为Where-Object过滤器:

$categories = @{
  '50' = 'A'
  '54' = 'C'
  '60' = 'F'
  '66' = 'I'
  '74' = 'M'
  '88' = 'T'
}

$category = @{
  Name       = 'Category'
  Expression = { $categories[$_.CategoryCode] + $_.CategoryId }
}

$pattern = 'DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'

Get-Content 'C:\path\to\file.txt' |
  ? { $_ -match $pattern } |
  % { [PSCustomObject]$matches } |
  select Database, $category, Length |
  Export-Csv 'C:\path\to\output.csv' -NoType

或者你可以提出@JPBlanc的建议(再次稍加修改):

$category = @{
  '50' = 'A'
  '54' = 'C'
  '60' = 'F'
  '66' = 'I'
  '74' = 'M'
  '88' = 'T'
}

$pattern = "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})"

Select-String -Path 'C:\path\to\file.txt' -Pattern $pattern | % {
  New-Object -TypeName PSObject -Property @{
    Database = $_.Matches.Groups[1].Value
    Category = $category[$_.Matches.Groups[2].Value] + $_.Matches.Groups[3].Value
    Length   = $_.Matches.Groups[4].Value
  }
} | Export-Csv 'C:\path\to\output.csv' -NoType

后者会给你稍微好一点的性能,虽然不是太多(执行时间是2:35 vs 2:50,对于我的测试盒上的120 MB输入文件)。