我正在尝试从输入字符串创建一个Perl哈希,但是我遇到了原始'split'的问题,因为值可能包含引号。下面是一个示例输入字符串,以及我的(所需)结果哈希:
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,MOB,123,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %hash =
(
CREATE => '',
USER => '',
TEL => '12345678',
MOB => '444001122',
Type => 'Whatever',
ATTRIBUTES => 'ID,0,MOB,123,KEY,VALUE',
TIME => '08:01:59',
FIN => '0',
);
输入字符串具有任意长度,并且未设置键数。
谢谢!
-hq
答案 0 :(得分:5)
使用Text::CSV。它正确处理逗号分隔值文件。
标准模块似乎无法解析您的输入格式,即使使用sep_char
和allow_loose_quotes
也是如此。因此,您必须自己进行繁重的工作,但仍然可以使用Text :: CSV来解析每个键值对:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);
use Data::Dumper;
use Text::CSV;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my @fields = split /:/, $command;
my %hash;
my $csv = Text::CSV->new();
my $i = 0;
while ($i <= $#fields) {
if (1 == $fields[$i] =~ y/"//) {
my $j = $i;
$fields[$i] .= ':' . $fields[$j] until 1 == $fields[++$j] =~ y/"//;
$fields[$i] .= ':' . $fields[$j];
splice @fields, $i + 1, $j - $i, ();
}
$csv->parse($fields[$i]);
my ($key, $value) = $csv->fields;
$hash{$key} = "$value"; # quotes turn undef to q()
$i++;
}
print Dumper \%hash;
答案 1 :(得分:3)
据我所知,最明显的候选人 - Text::CSV
- 将无法正确处理这种格式,因此本土的正则表达式解决方案是唯一的。
use strict;
use warnings;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %config;
for my $field ($command =~ /(?:"[^"]*"|[^:])+/g) {
my ($key, $val) = split /,/, $field, 2;
($config{$key} = $val // '') =~ s/"([^"]*)"/$1/;
}
use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);
<强>输出强>
%config = (
'TIME' => '08:01:59',
'MOB' => '444001122',
'Type' => 'Whatever',
'CREATE' => '',
'TEL' => '12345678',
'ATTRIBUTES' => 'ID,0,KEY,VALUE',
'USER' => '',
'FIN' => '0'
);
如果你有Perl v5.10或更高版本,那么你就拥有方便的(?| ... )
正则表达式组,可以让你写这个
use 5.010;
use warnings;
my $command = 'CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0';
my %config = $command =~ /(\w+) (?| , " ([^"]*) " | , ([^:"]*) | () )/gx;
use Data::Dumper;
print Data::Dumper->Dump([\%config], ['*config']);
与上面的代码产生相同的结果。
答案 2 :(得分:2)
这看起来像Text::ParseWords
可以处理的事情。 quotewords
子例程将分隔符:
上的输入分开,忽略引号内的分隔符。这将为我们提供项目的基本列表,在输出中首先显示为$VAR1
。在此之后,使用正则表达式解析逗号分隔的项目是一件简单的事情,该正则表达式将处理可选的第二次捕获以容纳空标记,例如CREATE
和USER
的标记。
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
while (<DATA>) {
chomp;
my @list = quotewords(':', 0, $_);
my %hash = map { my ($k, $v) = /([^,]+),?(.*)/; $k => $v; } @list;
print Dumper \@list, \%hash;
}
__DATA__
CREATE:USER:TEL,12345678:MOB,444001122:Type,Whatever:ATTRIBUTES,"ID,0,KEY,VALUE":TIME,"08:01:59":FIN,0
<强>输出:强>
$VAR1 = [
'CREATE',
'USER',
'TEL,12345678',
'MOB,444001122',
'Type,Whatever',
'ATTRIBUTES,ID,0,KEY,VALUE',
'TIME,08:01:59',
'FIN,0'
];
$VAR2 = {
'TIME' => '08:01:59',
'MOB' => '444001122',
'Type' => 'Whatever',
'CREATE' => '',
'TEL' => '12345678',
'ATTRIBUTES' => 'ID,0,KEY,VALUE',
'USER' => '',
'FIN' => '0'
};
答案 3 :(得分:0)
my %hash = $command =~ /([^:,]+)(?:,((?:[^:"]|"[^"]*")*))?/g;
s/"([^"]*)"/$1/g
for grep defined, values %hash;