在混合字符串上使用嵌入式引号解析CSV

时间:2016-09-14 13:19:13

标签: regex perl csv quotes comma

我环顾四周,但找不到一个整洁的工作解决方案。我一直在尝试使用TEXT:CSV_XS,所以这不仅仅是用正则表达式做一些艰难的事情。我可能无法轻松安装TEXT :: CSV,但我确实有XS版本。

我只需要解析csv字段,稍后我将分解成kv对。

use Text::CSV_XS;
use Data::Dumper;

my $csv = Text::CSV_XS->new ({ allow_loose_quotes => 1, 
                               allow_whitespace => 1,  
                               eol => $/ });

my $str3 = '09/11/2016 22:05:00 +0000, search_name="ThreatInjection - Rule", search_now=1473644880.000, search="bunchof|stuff1,bunch%of-stuff2", count=100';

my $status  = $csv->parse($str3);
my @details = $csv->fields();
print $csv->error_diag ();
print Dumper(\@details);

产生的结果是:

$VAR1 = [
      '09/11/2016 22:05:00 +0000',
      'search_name="ThreatInjection - Rule"',
      'search_now=1473644880.000',
      'search="bunchof|stuff1',
      'bunch%of-stuff2"',
      'count=100'
    ];

所以,问题是得到搜索=" bunchof | stuff1,束%-stuff2"留在一个领域。我确信答案很简单,但是,有点难过。任何帮助表示赞赏。

1 个答案:

答案 0 :(得分:1)

您可以使用标准Perl发行版中包含的Text::ParseWords永远。

#!/usr/bin/perl

use strict;
use warnings;
use Text::ParseWords;
use Data::Dumper;

my $str3 = '09/11/2016 22:05:00 +0000, search_name="ThreatInjection - Rule", search_now=1473644880.000, search="bunchof|stuff1,bunch%of-stuff2", count=100';

my @details = parse_line(',\s*', 1, $str3);

print Dumper \@details;

输出:

$VAR1 = [
          '09/11/2016 22:05:00 +0000',
          'search_name="ThreatInjection - Rule"',
          'search_now=1473644880.000',
          'search="bunchof|stuff1,bunch%of-stuff2"',
          'count=100'
        ];