Perl:使用split但忽略引号

时间:2013-02-02 13:27:27

标签: regex perl hash split

我正在尝试从输入字符串创建一个Perl哈希,但是我遇到了原始'split'的问题,因为值可能包含引号。下面是一个示例输入字符串,以及我的(所需)结果哈希:

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,MOB,123,KEY,VALUE":TIME,"08:01:59":FIN,0';

my %hash = 
  (
   CREATE     => '',
   USER       => '',
   TEL        => '12345678',
   MOB        => '444001122',
   Type       => 'Whatever',
   ATTRIBUTES => 'ID,0,MOB,123,KEY,VALUE',
   TIME       => '08:01:59',
   FIN        => '0',
  );

输入字符串具有任意长度,并且未设置键数。

谢谢!

-hq

4 个答案:

答案 0 :(得分:5)

使用Text::CSV。它正确处理逗号分隔值文件。

更新

标准模块似乎无法解析您的输入格式,即使使用sep_charallow_loose_quotes也是如此。因此,您必须自己进行繁重的工作,但仍然可以使用Text :: CSV来解析每个键值对:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);

use Data::Dumper;

use Text::CSV;

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';

my @fields = split /:/, $command;
my %hash;
my $csv = Text::CSV->new();

my $i = 0;
while ($i <= $#fields) {
    if (1 == $fields[$i] =~ y/"//) {
        my $j = $i;
        $fields[$i] .= ':' . $fields[$j] until 1 == $fields[++$j] =~ y/"//;
        $fields[$i] .= ':' . $fields[$j];
        splice @fields, $i + 1, $j - $i, ();
    }
    $csv->parse($fields[$i]);
    my ($key, $value) = $csv->fields;
    $hash{$key} = "$value"; # quotes turn undef to q()
    $i++;
}

print Dumper \%hash;

答案 1 :(得分:3)

据我所知,最明显的候选人 - Text::CSV - 将无法正确处理这种格式,因此本土的正则表达式解决方案是唯一的。

use strict;
use warnings;

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';

my %config;
for my $field ($command =~ /(?:"[^"]*"|[^:])+/g) {
  my ($key, $val) = split /,/, $field, 2;
  ($config{$key} = $val // '') =~ s/"([^"]*)"/$1/;
}

use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);

<强>输出

%config = (
            'TIME' => '08:01:59',
            'MOB' => '444001122',
            'Type' => 'Whatever',
            'CREATE' => '',
            'TEL' => '12345678',
            'ATTRIBUTES' => 'ID,0,KEY,VALUE',
            'USER' => '',
            'FIN' => '0'
          );

如果你有Perl v5.10或更高版本,那么你就拥有方便的(?| ... )正则表达式组,可以让你写这个

use 5.010;
use warnings;

my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';

my %config = $command =~ /(\w+) (?| , " ([^"]*) " | , ([^:"]*) | () )/gx;

use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);

与上面的代码产生相同的结果。

答案 2 :(得分:2)

这看起来像Text::ParseWords可以处理的事情。 quotewords子例程将分隔符:上的输入分开,忽略引号内的分隔符。这将为我们提供项目的基本列表,在输出中首先显示为$VAR1。在此之后,使用正则表达式解析逗号分隔的项目是一件简单的事情,该正则表达式将处理可选的第二次捕获以容纳空标记,例如CREATEUSER的标记。

use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;

while (<DATA>) {
    chomp;
    my @list = quotewords(':', 0, $_);
    my %hash = map { my ($k, $v) = /([^,]+),?(.*)/; $k => $v; } @list;
    print Dumper \@list, \%hash;
}

__DATA__
CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0

<强>输出:

$VAR1 = [
          'CREATE',
          'USER',
          'TEL,12345678',
          'MOB,444001122',
          'Type,Whatever',
          'ATTRIBUTES,ID,0,KEY,VALUE',
          'TIME,08:01:59',
          'FIN,0'
        ];
$VAR2 = {
          'TIME' => '08:01:59',
          'MOB' => '444001122',
          'Type' => 'Whatever',
          'CREATE' => '',
          'TEL' => '12345678',
          'ATTRIBUTES' => 'ID,0,KEY,VALUE',
          'USER' => '',
          'FIN' => '0'
        };

答案 3 :(得分:0)

my %hash = $command =~ /([^:,]+)(?:,((?:[^:"]|"[^"]*")*))?/g;
s/"([^"]*)"/$1/g
   for grep defined, values %hash;