正则表达式似乎不适用于Where-Object cmdlet

时间:2018-02-02 22:51:51

标签: regex powershell pipe

我正在尝试在逗号分隔行的文件中的两个字段周围添加引号字符。这是一行数据:

1/22/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0

我想成为这个:

1/22/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0

我开始在一个简单的PowerShell脚本中开发我的正则表达式,很快我就有了以下内容:

$strData = '1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0'
$strNew = $strData -replace "([^,]*),([^,]*),([^,]*),(.*)",'$1,"$2","$3",$4'
$strNew

给了我这个输出:

1/29/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0

大!我已经准备好了。将此示例扩展为具有类似数据行的文件的一般情况:

Get-Content test_data.csv | Where-Object -FilterScript {
    $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4'
}

这是test_data.csv列表:

1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0

这是我脚本的输出:

1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0

我也尝试过这个版本的脚本:

Get-Content test_data.csv | Where-Object -FilterScript {
    $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",$4"
}

并获得了相同的结果。

我的简单测试脚本让我确信正则表达式是正确的,但是当我在Where-Object cmdlet中的过滤器脚本中使用该正则表达式时会发生一些事情。

我在这里俯瞰什么简单而又关键的细节?

这是我的PSVerion:

Major  Minor  Build  Revision
-----  -----  -----  --------
5      0      10586  117

3 个答案:

答案 0 :(得分:1)

您误解了Where-Object的工作原理。 cmdlet将-FilterScript表达式求值为$true的输入行输出。它不会输出您在该脚本块中执行的任何操作(您可以使用ForEach-Object)。

但是,您不需要Where-ObjectForEach-Object。只需将Get-Content括在括号中,并将其用作-replace运算符的第一个操作数。你也不需要第四个捕获组。不过,我建议将表达式锚定在字符串的开头。

(Get-Content test_data.csv) -replace '^([^,]*),([^,]*),([^,]*)', '$1,"$2","$3"'

答案 1 :(得分:0)

这似乎适用于此。我使用ForEach-Object来处理每条记录。

Get-Content test_data.csv |
    ForEach-Object { $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4' }

这似乎也有效。使用?创造一个不情愿(懒惰)的捕获。

Get-Content test_data.csv |
    ForEach-Object { $_ -replace '(.*?),(.*?),(.*?),(.*)', '$1,"$2","$3",$4' }

答案 2 :(得分:0)

我会对你所拥有的东西进行一些小改动,以使其发挥作用。只需将脚本更改为以下内容,并注意我已将-FilterScript更改为ForEach-Object,并使用引号修正了正则表达式中最后一项上的小错字:

Get-Content c:\temp\test_data.csv | ForEach-Object {
    $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",`"`$4"
}

我使用您提供的数据对此进行了测试,并将引号添加到正确的列中。