将字符串解析为哈希

时间:2014-05-21 11:40:20

标签: regex perl parsing

我有一个字符串:

<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>;
rel="next",
<https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>;
rel="first",
<https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>;
rel="last"

所以格式是

(<val>; rel="key")*

我想将其解析为具有以下格式的哈希:

next => https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5
first => https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5
last => https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5

在Java中,我会使用正则表达式模式来提取每个键=&gt;价值对并将它们放入地图中。模式类似于:

<([^>]++)>;\s*rel="([^"]++)"

哪个会给我第二个匹配组中的键和第一个匹配组中的值。同样的方法是实现这一目标的最好方法是Perl,还是我能做些什么??

P.S。我使用Perl而不是Java的原因是服务器没有Java。

3 个答案:

答案 0 :(得分:6)

我的第一个倾向是将字符串拆分为逗号并使用三个子字符串,但最好在while循环中使用全局匹配。

这应该做你想要的。 (Perl是迄今为止更好的文本处理工具!)

更新我刚刚意识到你选择markdown会丢弃尖括号和换行符。这更合适吗?我认为这是一个多行字符串?

use strict;
use warnings;

my $str = <<'END';
<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>;
rel="next",
<https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>;
rel="first",
<https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>;
rel="last"
END

my %data;
while ($str =~ / < ([^<>]+) >; \s* rel="([^"]+)" (?:,\s*)? /xg) {
  $data{$2} = $1;
}

use Data::Dump;
dd \%data;

<强>输出

{
  first => "https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5",
  last  => "https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5",
  next  => "https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5",
}

答案 1 :(得分:4)

您可以{&#34;&#34;&#34;上的字符串split然后使用map创建哈希:

#!/usr/bin/env perl

use strict;
use warnings;

my $str = 'https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5; rel="next", https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5; rel="first", https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5; rel="last"';

my %hash = map { 
    my ($v, $k) = $_ =~ /\s*([^;]+);\s*rel="([^"]+)".*/; 
    $k => $v;
} split ',', $str;

foreach my $key (keys %hash) {
    print "$key => $hash{$key}\n"
}

输出:

first => https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5
next => https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5
last => https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5

更新

使用新字符串,您可以:

$str = q(<https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5>; rel="next", <https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5>; rel="first", <https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5>; rel="last");

my %hash = map { 
    my ($v, $k) = $_ =~ /<([^>]+)>;\s*rel="([^"]+)".*/; 
    $k => $v;
} split ',', $str;

得到相同的结果。

答案 2 :(得分:1)

use strict;
use warnings;
my $string='https://gitlab.me.com/api/v3/projects/all?page=2&per_page=5; rel="next", https://gitlab.me.com/api/v3/projects/all?page=1&per_page=5; rel="first", https://gitlab.me.com/api/v3/projects/all?page=8&per_page=5; rel="last"';

my @array=split /,/, $string;
my %hash;

foreach(@array)
{
   if($_=~/(.*?);\s*rel\=\s*"([^"]+)"/)
   {
      $hash{$2}=$1;
   }
}

print "$_ =>  $hash{$_}\n" foreach(keys%hash);