我有一系列"字" (字符串),由"字母表中的字母组成"用户定义的序列。例如,我的"字母表"以"ʔ ʕ b g d"
开头,所以"单词列表" (bʔd ʔbg ʕʔb bʕd)
之后的sort by_my_alphabet
应为ʔbd ʕʔb bʔd bʕd
。
sort by_my_alphabet (bʔd ʔbg ʕʔb bʕd) # gives ʔbd ʕʔb bʔd bʕd
有没有办法用by_my_alphabet
和$a
制作一个简单的子例程$b
来解决这个问题?
答案 0 :(得分:4)
简单,非常快,因为它不使用比较回调,但它需要扫描整个字符串:
use utf8;
my @my_chr = split //, "ʔʕbgd";
my %my_ord = map { $my_chr[$_] => $_ } 0..$#my_chr;
my @sorted =
map { join '', @my_chr[ unpack 'W*', $_ ] } # "\x00\x01\x02\x03\x04" ⇒ "ʔʕbgd"
sort
map { pack 'W*', @my_ord{ split //, $_ } } # "ʔʕbgd" ⇒ "\x00\x01\x02\x03\x04"
@unsorted;
针对长字符串进行了优化,因为它只扫描字符串直到找到差异:
use utf8;
use List::Util qw( min );
my @my_chr = split //, "ʔʕbgd";
my %my_ord = map { $my_chr[$_] => $_ } 0..$#my_chr;
sub my_cmp($$) {
for ( 0 .. ( min map length($_), @_ ) - 1 ) {
my $cmp = $my_ord{substr($_[0], $_, 1)} <=> $my_ord{substr($_[1], $_, 1)};
return $cmp if $cmp;
}
return length($_[0]) <=> length($_[1]);
}
my @sorted = sort my_cmp @unsorted;
两者都应该比Sobrique's更快。他们使用比较回调,并扫描被比较的整个字符串。
答案 1 :(得分:3)
是。
sort
可以使用任何返回相对排序位置的函数。您所需要的只是一个能够正确查找“排序值”的功能。用于比较的字符串
所以你需要做的就是定义一个相对权重&#39;你的额外字母,然后比较两个。
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my @sort_order = qw ( B C A D );
my @array_to_sort = qw ( A B C D A B C D AB BB CCC ABC );
my $count = 0;
my %position_of;
$position_of{$_} = $count++ for @sort_order;
print Dumper \%position_of;
sub sort_by_pos {
my @a = split //, $a;
my @b = split //, $b;
#iterate one letter at a time, using 'shift' to take it off the front
#of the array.
while ( @a and @b ) {
my $result = $position_of{shift @a} <=> $position_of{shift @b};
#result is 'true' if it's "-1" or "1" which indicates relative position.
# 0 is false, and that'll cause the next loop iteration to test the next
#letter-pair
return $result if $result;
}
#return a value based on remaining length - longest 'string' will sort last;
#That's so "AAA" comparing with "AA" comparison actually work,
return scalar @a <=> scalar @b;
}
my @new = sort { sort_by_pos } @array_to_sort;
print Dumper \@new;
一个简单的案例,但它将我们的数组排序为:
$VAR1 = [
'B',
'B',
'BB',
'C',
'C',
'CCC',
'A',
'A',
'AB',
'ABC',
'D',
'D'
];