使用sed或awk或perl替换数据的双引号限定符

时间:2018-09-09 08:41:39

标签: perl awk sed

我有带有|分隔符和"限定符的txt文件。我想将限定词更改为~符号,我遇到的问题是实际的列值文本带有双引号。

我需要更改限定词而不删除列值内的双引号。我提供了一个示例记录:

"Live Your Dreams: Be You"|"20 Feb 2018"|"2 formats and editions"|"Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny."|"All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie""|"Indian Edition"

我已经通过引用堆栈溢出和unix.com中的内容来尝试使用sedawk,但是该列中的双引号引起了问题。

所需的输出:

~Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~

尝试过的代码: sed's _“([^ *])” _〜\ 1〜_g'data.txt> tdata.txt

根据上述sed的结果:

"Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~

非常感谢使用awksedPerl脚本的帮助。

预先感谢, 帕布

2 个答案:

答案 0 :(得分:0)

您实际拥有的是格式错误的CSV数据,其中的分隔符char为|

此格式不正确,因为未转义“内部”引号:在包含引号的CSV字段中,应复制引号,如下所示

1,2,"field,with,commas","this field ""contains quotes"" that are duplicated"
# ..................................^^...............^^

如果可以将输入数据修复为如下形式:

"Live Your Dreams: Be You"|"20 Feb 2018"|"2 formats and editions"|"Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
""Live Your Dreams""
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny."|"All this and more as you immerse yourself in the story that opens up like scenes from ""a Bollywood movie"""|"Indian Edition"

在第2行和第3行的内部引号正确转义的位置,则可以使用CSV解析器来转换输出引号。 Perl的csv解析器可以处理包含换行符的字段:

perl -MText::CSV -e '
    open my $fh, "<:encoding(UTF-8)", shift(@ARGV);
    my $csv_in = Text::CSV->new({ quote_char => "\"", sep_char => "|", binary => 1 });
    my $csv_out = Text::CSV->new({ quote_char => "~", escape_char => "~", sep => "|", binary => 1 });
    while (my $row = $csv_in->getline($fh)) {
        $csv_out->say(STDOUT, $row);
    }
    $csv_in->eof or $csv_in->error_diag();
' file.csv
~Live Your Dreams: Be You~|~20 Feb 2018~|~2 formats and editions~|~Are you being swept away by life being busy? Are things seemingly out of your control? Do you want to calm the chaos in your life? Are you ready to transform your life? In 
"Live Your Dreams"
now AMAZON BESTSELLER, readers are shown how to take immediate control of their mental, emotional, physical and entrepreneurial destiny.~|~All this and more as you immerse yourself in the story that opens up like scenes from "a Bollywood movie"~|~Indian Edition~

答案 1 :(得分:-1)

在Perl中,您可以尝试使用这种衬垫:

perl -anF'\|' -E 'for (@F) {s/^"/~/;s/"$/~/} print join "|", @F' file.txt

这将拆分|上的每一行,然后在每个字段的开头或结尾处用"替换~

根据注释中的新信息:如果要保留一行(字段)不变的行:

perl -anF'\|' -E 'if (@F == 1) {print; next} for (@F) {s/^"/~/;s/"$/~/} print join "|", @F' file.txt