如何使用SED替换子字符串中的特定字符

时间:2019-10-19 01:13:22

标签: regex csv sed

所以,我有一个csv文件,其中包含多行

"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3"","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"

我只想将"",的{​​{1}}替换为第8列,或者将其用在GAUZE PACKING STRIPS 1/4或ACE WRAP 3之后的地方,而无需触摸其他",线。

尝试过"",,但同时也删除了sed 's/[[:alnum:]]""//g' file.csv

有什么想法吗?非常感激!

2 个答案:

答案 0 :(得分:2)

您可以使用捕获组来匹配和替换双引号之间的所有内容,并立即在其后加上双引号。

要匹配的正则表达式如下所示:("[^",]*")"。请注意两件事:第一个是"在字面上是匹配的,而中间的[^",]*表达式仅表示正则表达式将匹配除",之外的任何内容。 。这意味着它将阻止匹配的字符串在内部加上引号。

最后,括号是捕获组,我们可以引用任何与()之间带有反斜杠和数字的子正则表达式匹配的内容。例如,\1将被第一个捕获组的匹配替换,\3与第三个捕获组的匹配,依此类推。

您需要的sed脚本可能看起来像这样:

sed -re 's/("[^",]*")"/\1/g'

查看最后一个双引号如何在捕获组之外,并且不会将其替换为\1

捕获组是扩展正则表达式(ERE)的功能,因此需要标记-r来启用它们,否则它将使用基本正则表达式(BRE)。

还请注意最后的/g。 sed需要此功能才能匹配和替换同一行中的多个事件。

示例:

$ cat test
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00",""","XXX","XXX","2019-02-12T23:57:48-06:00"","XXX-XXX-176568981"
$ cat test | sed -re 's/("[^",]*")"/\1/g'
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"

答案 1 :(得分:0)

使用awk:

$ awk '
BEGIN { FS=OFS="," }           # set delimiters
{
    if($8!="\"\"")             # if $8 is not empty ie. ""
        sub(/\"\"$/,"\"",$8)   # replace trailing double quotes with a single double quote
}1' file                       # output

输出:

"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"