Question

我在文本文件中有如下示例所示的行：

"2009217",2015,3,"N","N","2","UPPER DARBY FIREFIGHTERS "PAC"","","","","7235 WEST CHESTER PIKE","","UPPER DARBY","PA","19082","","6106220269",4245.0100,650.0000,.0000

我想在整个文件中用类似于此"UPPER DARBY FIREFIGHTERS "PAC""的多个部分字符串替换每个双引号。

因此对于重复双引号的每个实例，结果应如下所示：

"2009217",2015,3,"N","N","2","UPPER DARBY FIREFIGHTERS PAC","","","","7235 WEST CHESTER PIKE","","UPPER DARBY","PA","19082","","6106220269",4245.0100,650.0000,.0000

我来到了sed行：

cat file.txt | sed "s/\([^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,\)\([^,]*\),\(.*\)/\1\2\3/"

但是现在我不知道如何替换\2中的双引号。

sed有可能吗？

Answer 1

我个人会使用awk，因为它更具可读性：

#!/usr/bin/env awk
BEGIN {
    # Use ',' as the input and output field delimiter
    FS=OFS=","
}
{
    # Iterate through all fields. (NF is the number of fields.)
    for(i=1;i<=NF;i++) {
        # If the field starts and ends with a '"'
        if($i ~ /^".*"$/) {
            # Replace all '""
            gsub(/"/,"",$i)
            # Wrap in '"' again
            $i = "\"" $i "\""
        }
    }
}
print

Answer 2

这可能对您有用（GNU sed）：

sed -r ':a;s/^((([^",]*,)*("[^",]*",([^",]*,)*)*)"[^",]*)"([^,])/\1\6/;ta' file

这将从用双引号括起来并以,分隔的字符串中删除多余的双引号。

它通过消除正确构造的双引号字符串和未加引号的字符串（在此示例中为数字），然后删除不带,

的双引号来实现此目的。

[^",]*,                            # non double quoted strings
"[^",]*",                          # properly quoted strings
(([^",]*,)*("[^",]*",([^",]*,)*)*) # eliminate all properly constructed strings
"[^",]*"([^,])                     # improper double quotes
       ^
       |

替换字符串中的每个“

2 个答案: