Perl按用户定义的字母序列对单词进行排序

时间:2017-09-27 15:53:58

标签: perl sorting

我有一系列"字" (字符串),由"字母表中的字母组成"用户定义的序列。例如,我的"字母表"以"ʔ ʕ b g d"开头,所以"单词列表" (bʔd ʔbg ʕʔb bʕd)之后的sort by_my_alphabet应为ʔbd ʕʔb bʔd bʕd

sort by_my_alphabet (bʔd ʔbg ʕʔb bʕd) # gives ʔbd ʕʔb bʔd bʕd

有没有办法用by_my_alphabet$a制作一个简单的子例程$b来解决这个问题?

2 个答案:

答案 0 :(得分:4)

简单,非常快,因为它不使用比较回调,但它需要扫描整个字符串:

use utf8;

my @my_chr = split //, "ʔʕbgd";
my %my_ord = map { $my_chr[$_] => $_ } 0..$#my_chr;

my @sorted =
   map { join '', @my_chr[ unpack 'W*', $_ ] }   # "\x00\x01\x02\x03\x04" ⇒ "ʔʕbgd"
   sort
   map { pack 'W*', @my_ord{ split //, $_ } }    # "ʔʕbgd" ⇒ "\x00\x01\x02\x03\x04"
   @unsorted;

针对长字符串进行了优化,因为它只扫描字符串直到找到差异:

use utf8;

use List::Util qw( min );

my @my_chr = split //, "ʔʕbgd";
my %my_ord = map { $my_chr[$_] => $_ } 0..$#my_chr;

sub my_cmp($$) {
   for ( 0 .. ( min map length($_), @_ ) - 1 ) {
      my $cmp = $my_ord{substr($_[0], $_, 1)} <=> $my_ord{substr($_[1], $_, 1)};
      return $cmp if $cmp;
   }

   return length($_[0]) <=> length($_[1]);
}

my @sorted = sort my_cmp @unsorted;

两者都应该比Sobrique's更快。他们使用比较回调,并扫描被比较的整个字符串。

答案 1 :(得分:3)

是。

sort可以使用任何返回相对排序位置的函数。您所需要的只是一个能够正确查找“排序值”的功能。用于比较的字符串

所以你需要做的就是定义一个相对权重&#39;你的额外字母,然后比较两个。

#!/usr/bin/env perl

use strict;
use warnings; 

use Data::Dumper;

my @sort_order = qw ( B C A D ); 

my @array_to_sort = qw ( A B C D A B C D AB BB CCC ABC ); 

my $count = 0; 
my %position_of;
$position_of{$_} = $count++ for @sort_order;

print Dumper \%position_of;

sub sort_by_pos {

   my @a = split //, $a;
   my @b = split //, $b; 

   #iterate one letter at a time, using 'shift' to take it off the front
   #of the array.
   while ( @a and @b ) {
     my $result = $position_of{shift @a} <=> $position_of{shift @b};
     #result is 'true' if it's "-1" or "1" which indicates relative position.
     # 0 is false, and that'll cause the next loop iteration to test the next
     #letter-pair
     return $result if $result;
   }
   #return a value based on remaining length - longest 'string' will sort last;
   #That's so "AAA" comparing with "AA" comparison actually work, 
   return scalar @a <=> scalar @b;
}


my @new = sort { sort_by_pos } @array_to_sort;

print Dumper \@new;

一个简单的案例,但它将我们的数组排序为:

$VAR1 = [
          'B',
          'B',
          'BB',
          'C',
          'C',
          'CCC',
          'A',
          'A',
          'AB',
          'ABC',
          'D',
          'D'
        ];