我如何在我的CSV中使用正则表达式,以及\ N,以便mysqlimport能够理解它们?

时间:2011-10-27 17:25:24

标签: mysql regex perl csv import

说我有像

这样的普通CSV
# helloworld.csv
hello,world,,,"please don't replace quoted stuff like ,,",,

如果我希望mysqlimport了解其中某些字段是NULL,那么我需要:

# helloworld.mysql.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,,",\N,\N

我从另一个问题得到了一些帮助 - Why does sed not replace overlapping patterns - 但请注意问题:

$ perl -pe 'while (s#,,#,\\N,#) {}' -pe 's/,$/,\\N/g' helloworld.csv
hello,world,\N,\N,"please don't replace quoted stuff like ,\N,",\N,\N
                                                           ^^

如何编写正则表达式,以便它们在引号之间不会替换,,

最终答案

这是我使用的最终perl,感谢下面接受的答案:

perl -pe 's/^,/\\N,/; while (s/,(?=,)(?=(?:[^"]*"[^"]*")*[^"]*$)/,\\N/g) {}; s/,$/,\\N/' helloworld.csv

它处理前导,尾随和不带引号的空字符串。

2 个答案:

答案 0 :(得分:7)

为什么不使用Text::CSV?您可以使用它解析文件,然后使用map将空字段替换为'\ N',例如

use Text::CSV;

my $csv = Text::CSV->new({ binary => 1 }) or die Text::CSV->error_diag();
$csv->parse($line);           # parse a CSV string into fields
my @fields = $csv->fields();  # get the parsed fields

@fields = map { $_ eq "" ? '\N' : $_ } @fields;

$csv->combine(@fields);    # combine fields into a string

答案 1 :(得分:3)

假设您没有转义引号,您可以确保只有在,,后面跟着偶数引号时才会替换它:

$subject =~ 
    s/,       # Match ,
    (?=,)     # only if followed by another ,
    (?=       # and only if followed by...
     (?:      # the following group:
      [^"]*"  #  any number of non-quote characters, followed by one quote
      [^"]*"  #  the same thing again (even number!)
     )*       # any number of times, followed by
     [^"]*    # any number of non-quotes until...
     $        # end of string.
    )         # End of lookahead assertion
    /,\N/x
    g;

输入:

foo,,bar,,,baz,"foo,,,oof",zap,,zip

输出:

foo,\N,bar,\N,\N,baz,"foo,,,oof",zap,\N,zip