说我有像
这样的普通CSV# helloworld.csv
hello,world,,,"please don't replace quoted stuff like ,,",,
如果我希望mysqlimport
了解其中某些字段是NULL
,那么我需要:
# helloworld.mysql.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,,",\N,\N
我从另一个问题得到了一些帮助 - Why does sed not replace overlapping patterns - 但请注意问题:
$ perl -pe 'while (s#,,#,\\N,#) {}' -pe 's/,$/,\\N/g' helloworld.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,\N,",\N,\N
^^
如何编写正则表达式,以便它们在引号之间不会替换,,
?
最终答案
这是我使用的最终perl,感谢下面接受的答案:
perl -pe 's/^,/\\N,/; while (s/,(?=,)(?=(?:[^"]*"[^"]*")*[^"]*$)/,\\N/g) {}; s/,$/,\\N/' helloworld.csv
它处理前导,尾随和不带引号的空字符串。
答案 0 :(得分:7)
为什么不使用Text::CSV
?您可以使用它解析文件,然后使用map
将空字段替换为'\ N',例如
use Text::CSV;
my $csv = Text::CSV->new({ binary => 1 }) or die Text::CSV->error_diag();
$csv->parse($line); # parse a CSV string into fields
my @fields = $csv->fields(); # get the parsed fields
@fields = map { $_ eq "" ? '\N' : $_ } @fields;
$csv->combine(@fields); # combine fields into a string
答案 1 :(得分:3)
假设您没有转义引号,您可以确保只有在,,
后面跟着偶数引号时才会替换它:
$subject =~
s/, # Match ,
(?=,) # only if followed by another ,
(?= # and only if followed by...
(?: # the following group:
[^"]*" # any number of non-quote characters, followed by one quote
[^"]*" # the same thing again (even number!)
)* # any number of times, followed by
[^"]* # any number of non-quotes until...
$ # end of string.
) # End of lookahead assertion
/,\N/x
g;
输入:
foo,,bar,,,baz,"foo,,,oof",zap,,zip
输出:
foo,\N,bar,\N,\N,baz,"foo,,,oof",zap,\N,zip