我正在尝试在逗号分隔行的文件中的两个字段周围添加引号字符。这是一行数据:
1/22/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
我想成为这个:
1/22/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
我开始在一个简单的PowerShell脚本中开发我的正则表达式,很快我就有了以下内容:
$strData = '1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0'
$strNew = $strData -replace "([^,]*),([^,]*),([^,]*),(.*)",'$1,"$2","$3",$4'
$strNew
给了我这个输出:
1/29/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
大!我已经准备好了。将此示例扩展为具有类似数据行的文件的一般情况:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4'
}
这是test_data.csv列表:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0 1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0 1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0 1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
这是我脚本的输出:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0 1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0 1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0 1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
我也尝试过这个版本的脚本:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",$4"
}
并获得了相同的结果。
我的简单测试脚本让我确信正则表达式是正确的,但是当我在Where-Object
cmdlet中的过滤器脚本中使用该正则表达式时会发生一些事情。
我在这里俯瞰什么简单而又关键的细节?
这是我的PSVerion:
Major Minor Build Revision ----- ----- ----- -------- 5 0 10586 117
答案 0 :(得分:1)
您误解了Where-Object
的工作原理。 cmdlet将-FilterScript
表达式求值为$true
的输入行输出。它不会输出您在该脚本块中执行的任何操作(您可以使用ForEach-Object
)。
但是,您不需要Where-Object
或ForEach-Object
。只需将Get-Content
括在括号中,并将其用作-replace
运算符的第一个操作数。你也不需要第四个捕获组。不过,我建议将表达式锚定在字符串的开头。
(Get-Content test_data.csv) -replace '^([^,]*),([^,]*),([^,]*)', '$1,"$2","$3"'
答案 1 :(得分:0)
这似乎适用于此。我使用ForEach-Object
来处理每条记录。
Get-Content test_data.csv |
ForEach-Object { $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4' }
这似乎也有效。使用?创造一个不情愿(懒惰)的捕获。
Get-Content test_data.csv |
ForEach-Object { $_ -replace '(.*?),(.*?),(.*?),(.*)', '$1,"$2","$3",$4' }
答案 2 :(得分:0)
我会对你所拥有的东西进行一些小改动,以使其发挥作用。只需将脚本更改为以下内容,并注意我已将-FilterScript
更改为ForEach-Object
,并使用引号修正了正则表达式中最后一项上的小错字:
Get-Content c:\temp\test_data.csv | ForEach-Object {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",`"`$4"
}
我使用您提供的数据对此进行了测试,并将引号添加到正确的列中。