Question

说我有像

这样的普通CSV

# helloworld.csv
hello,world,,,"please don't replace quoted stuff like ,,",,

如果我希望mysqlimport了解其中某些字段是NULL，那么我需要：

# helloworld.mysql.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,,",\N,\N

我从另一个问题得到了一些帮助 - Why does sed not replace overlapping patterns - 但请注意问题：

$ perl -pe 'while (s#,,#,\\N,#) {}' -pe 's/,$/,\\N/g' helloworld.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,\N,",\N,\N
                                                           ^^

如何编写正则表达式，以便它们在引号之间不会替换,,？

最终答案

这是我使用的最终perl，感谢下面接受的答案：

perl -pe 's/^,/\\N,/; while (s/,(?=,)(?=(?:[^"]*"[^"]*")*[^"]*$)/,\\N/g) {}; s/,$/,\\N/' helloworld.csv

它处理前导，尾随和不带引号的空字符串。

Answer 1

为什么不使用Text::CSV？您可以使用它解析文件，然后使用map将空字段替换为'\ N'，例如

use Text::CSV;

my $csv = Text::CSV->new({ binary => 1 }) or die Text::CSV->error_diag();
$csv->parse($line);           # parse a CSV string into fields
my @fields = $csv->fields();  # get the parsed fields

@fields = map { $_ eq "" ? '\N' : $_ } @fields;

$csv->combine(@fields);    # combine fields into a string

Answer 2

假设您没有转义引号，您可以确保只有在,,后面跟着偶数引号时才会替换它：

$subject =~ 
    s/,       # Match ,
    (?=,)     # only if followed by another ,
    (?=       # and only if followed by...
     (?:      # the following group:
      [^"]*"  #  any number of non-quote characters, followed by one quote
      [^"]*"  #  the same thing again (even number!)
     )*       # any number of times, followed by
     [^"]*    # any number of non-quotes until...
     $        # end of string.
    )         # End of lookahead assertion
    /,\N/x
    g;

输入：

foo,,bar,,,baz,"foo,,,oof",zap,,zip

输出：

foo,\N,bar,\N,\N,baz,"foo,,,oof",zap,\N,zip

我如何在我的CSV中使用正则表达式，以及\ N，以便mysqlimport能够理解它们？

2 个答案: