我有一个格式为的csv文件:
value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4
我想用&#39 ;;'替换第三个字段最外面引号中的逗号。并删除内部引号。我尝试过使用" sed"但没有任何东西有助于取代嵌套的引号。
答案 0 :(得分:3)
你需要一个递归的正则表达式来匹配嵌套的引号,而改变引号和逗号的最简单方法是一个表达式替换,与非破坏性音译一致,这个音译在v5.14中可用。 Perl的
喜欢这个
use strict;
use warnings 'all';
use v5.14;
my $str = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';
$str =~ s{ " ( (?: [^"]++ | (?R) )* ) " }{ $1 =~ tr/,"/;/dr }egx;
print $str, "\n";
value1, value2, some text in the; quotes; with commas and nested quotes; some more text, value3, value4
答案 1 :(得分:2)
可以这样做。
标准是引用字段中包含的偶数引号
用逗号作为字段分隔符。
请注意,如果csv不符合上述标准,则不会保存任何内容,
它永远不会被解析。
(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))
格式化:
(?: ^ | , )
\s*
\K
"
( # (1 start)
[^"]*
(?: # Inner, even number of quotes
"
[^"]*
"
[^"]*
)+
) # (1 end)
"
(?=
\s*
(?: , | $ )
)
Perl示例:
use strict;
use warnings;
my $data = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';
sub innerRepl
{
my ($in) = @_;
return '"' . ($in =~ tr/,"/;/dr ) . '"';
}
$data =~ s/(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))/ innerRepl( $1 ) /eg;
print $data;
输出:
value1, value2, "some text in the; quotes; with commas and nested quotes; some more text", value3, value4