所以,我有一个csv文件,其中包含多行
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3"","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"
我只想将"",
的{{1}}替换为第8列,或者将其用在GAUZE PACKING STRIPS 1/4或ACE WRAP 3之后的地方,而无需触摸其他",
线。
尝试过"",
,但同时也删除了sed 's/[[:alnum:]]""//g' file.csv
。
有什么想法吗?非常感激!
答案 0 :(得分:2)
您可以使用捕获组来匹配和替换双引号之间的所有内容,并立即在其后加上双引号。
要匹配的正则表达式如下所示:("[^",]*")"
。请注意两件事:第一个是"
在字面上是匹配的,而中间的[^",]*
表达式仅表示正则表达式将匹配除"
或,
之外的任何内容。 。这意味着它将阻止匹配的字符串在内部加上引号。
最后,括号是捕获组,我们可以引用任何与()
之间带有反斜杠和数字的子正则表达式匹配的内容。例如,\1
将被第一个捕获组的匹配替换,\3
与第三个捕获组的匹配,依此类推。
您需要的sed脚本可能看起来像这样:
sed -re 's/("[^",]*")"/\1/g'
查看最后一个双引号如何在捕获组之外,并且不会将其替换为\1
。
捕获组是扩展正则表达式(ERE)的功能,因此需要标记-r
来启用它们,否则它将使用基本正则表达式(BRE)。
还请注意最后的/g
。 sed需要此功能才能匹配和替换同一行中的多个事件。
示例:
$ cat test
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00",""","XXX","XXX","2019-02-12T23:57:48-06:00"","XXX-XXX-176568981"
$ cat test | sed -re 's/("[^",]*")"/\1/g'
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
答案 1 :(得分:0)
使用awk:
$ awk '
BEGIN { FS=OFS="," } # set delimiters
{
if($8!="\"\"") # if $8 is not empty ie. ""
sub(/\"\"$/,"\"",$8) # replace trailing double quotes with a single double quote
}1' file # output
输出:
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"