如何从perl中的哈希数组中获取唯一值

时间:2015-01-29 14:49:32

标签: regex perl grep

我有一个哈希数组,我需要从这个数据结构中获取college_name的唯一值。

我已经取得了同样的成果,但看起来是一个漫长的过程。

use strict;
use warnings;

use Data::Dumper;
use List::MoreUtils qw(uniq);

my %col_hash    = ();

my $college_ids = [
  {
    'term'         => 'SPRING',
    'city_code'    => '530233',
    'college_id'   => '200',
    'college_name' => 'Arts',
    'course_name'  => 'Drawing',
  },
  {
    'term'         => 'SUMMER',
    'city_code'    => '534233',
    'college_id'   => '300',
    'college_name' => 'COMMERCE',
    'course_name'  => 'FINANCE',
  }
];

foreach my $elem (@$college_ids) {
  if (exists $col_hash{'college_name'}) {
    push(@{ $col_hash{'college_name'} }, $elem->{'college_name'});
  }
  else {
    $col_hash{'college_name'} = [$elem->{'college_name'}];
  }
}

my @unique_college_names = uniq @{ $col_hash{'college_name'} };
warn Dumper(" LONG METHOD  = ", @unique_college_names);

我必须对Term,College_name,City code做同样的事情。

是否有另一种方法可以实现相同的功能?

3 个答案:

答案 0 :(得分:1)

与大多数语言不同,Perl允许您push到当前未定义的变量。它将 autovivify 一个数组并设置变量以引用它。

这是一个演示功能的简短程序

my $list;
push @$list, qw/ a b c /;
print $list->[1];

<强>输出

b

因此,无需使用$list预先定义my $list = []

这意味着您可以将for循环缩减为

for my $elem (@$college_ids) {
    $col_hash{college_name} = [ $elem->{college_name} ];
}

但是我认为使用散列哈希来跟踪每个类别的唯一值是最简单的。该程序再次使用autovivication来增加可能不存在的哈希元素。在循环之后,散列的值等于该类别的值的发生次数,但在这种情况下,您对计数不感兴趣 - 只需要列出散列的(唯一)键每个类别。

use strict;
use warnings;

my %col_hash;

my $college_ids = [
  {
    'term'         => 'SPRING',
    'city_code'    => '530233',
    'college_id'   => '200',
    'college_name' => 'Arts',
    'course_name'  => 'Drawing',
  },
  {
    'term'         => 'SUMMER',
    'city_code'    => '534233',
    'college_id'   => '300',
    'college_name' => 'COMMERCE',
    'course_name'  => 'FINANCE',
  }
];

my %unique;

for my $elem (@$college_ids) {
  while (my ($key, $val) = each %$elem) {
    ++$unique{$key}{$val};
  }
}

for my $field ( qw/ term college_name city_code / ) {
  print "$field\n";
  print "  $_\n" for sort keys %{ $unique{$field} };
  print "\n";
}

<强>输出

term
  SPRING
  SUMMER

college_name
  Arts
  COMMERCE

city_code
  530233
  534233

答案 1 :(得分:1)

鲍罗丁的答案几乎就在那里,但最适合avoid using each

在这种情况下,删除每个可以缩短它:

use strict;
use warnings;

my $college_ids = [
  {
    'term'         => 'SPRING',
    'city_code'    => '530233',
    'college_id'   => '200',
    'college_name' => 'Arts',
    'course_name'  => 'Drawing',
  },
  {
    'term'         => 'SUMMER',
    'city_code'    => '534233',
    'college_id'   => '300',
    'college_name' => 'COMMERCE',
    'course_name'  => 'FINANCE',
  }
];

my %unique;
for my $elem (@$college_ids) {
  ++$unique{$_}{$elem->{$_}} for keys %$elem;
}

for my $field (qw(term college_name city_code)) {
  print "$field\n";
  print "  $_\n" for sort keys %{ $unique{$field} };
  print "\n";
}

答案 2 :(得分:-1)

我用这一行做到了。没有循环。

my %uniq_colleges = map { $_->{'college_name'} => 1 } @$college_ids;

稍后keys %uniq_colleges会给我一份独特的大学名单。

由于