List-1 List-2 one one two three three three four four five six six seven eight eighttt nine nine
期待输出
one | one PASS two | * FAIL MISSING three | three PASS * | three FAIL EXTRA four | four PASS five | * FAIL MISSING six | six PASS * | seven FAIL EXTRA eight | eighttt FAIL INVALID nine | nine PASS
实际上,从我当前的解决方案中返回的是对两个修改后的列表的引用,以及对“失败”列表的引用,该列表描述了索引失败为“无失败”,“丢失”,“额外”或“无效“这也是(显然)精细输出。
我目前的解决方案是:
sub compare {
local $thisfound = shift;
local $thatfound = shift;
local @thisorig = @{ $thisfound };
local @thatorig = @{ $thatfound };
local $best = 9999;
foreach $n (1..6) {
local $diff = 0;
local @thisfound = @thisorig;
local @thatfound = @thatorig;
local @fail = ();
for (local $i=0;$i<scalar(@thisfound) || $i<scalar(@thatfound);$i++) {
if($thisfound[$i] eq $thatfound[$i]) {
$fail[$i] = 'NO_FAIL';
next;
}
if($n == 1) { # 1 2 3
next unless __compare_missing__();
next unless __compare_extra__();
next unless __compare_invalid__();
} elsif($n == 2) { # 1 3 2
next unless __compare_missing__();
next unless __compare_invalid__();
next unless __compare_extra__();
} elsif($n == 3) { # 2 1 3
next unless __compare_extra__();
next unless __compare_missing__();
next unless __compare_invalid__();
} elsif($n == 4) { # 2 3 1
next unless __compare_extra__();
next unless __compare_invalid__();
next unless __compare_missing__();
} elsif($n == 5) { # 3 1 2
next unless __compare_invalid__();
next unless __compare_missing__();
next unless __compare_extra__();
} elsif($n == 6) { # 3 2 1
next unless __compare_invalid__();
next unless __compare_extra__();
next unless __compare_missing__();
}
push @fail,'INVALID';
$diff += 1;
}
if ($diff<$best) {
$best = $diff;
@thisbest = @thisfound;
@thatbest = @thatfound;
@failbest = @fail;
}
}
return (\@thisbest,\@thatbest,\@failbest)
}
sub __compare_missing__ {
my $j;
### Does that command match a later this command? ###
### If so most likely a MISSING command ###
for($j=$i+1;$j<scalar(@thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(@fail,'MISSING'); }
@end = @thatfound[$i..$#thatfound];
@thatfound = @thatfound[0..$i-1];
for ($i..$j-1) { push(@thatfound,'*'); }
push(@thatfound,@end);
$i=$j-1;
last;
}
}
$j == scalar(@thisfound);
}
sub __compare_extra__ {
my $j;
### Does this command match a later that command? ###
### If so, most likely an EXTRA command ###
for($j=$i+1;$j<scalar(@thatfound);$j++) {
if($thatfound[$j] eq $thisfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(@fail,'EXTRA'); }
@end = @thisfound[$i..$#thisfound];
@thisfound = @thisfound[0..$i-1];
for ($i..$j-1) { push (@thisfound,'*'); }
push(@thisfound,@end);
$i=$j-1;
last;
}
}
$j == scalar(@thatfound);
}
sub __compare_invalid__ {
my $j;
### Do later commands match? ###
### If so most likely an INVALID command ###
for($j=$i+1;$j<scalar(@thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$j]) {
$diff += $j-$i;
for ($i..$j-1) { push(@fail,'INVALID'); }
$i=$j-1;
last;
}
}
$j == scalar(@thisfound);
}
但这并不完美......谁想要简化和改进?具体而言......在单个数据集中,一个搜索顺序对于子集更好,而另一个顺序对于不同的子集更好。
答案 0 :(得分:4)
如果数组包含重复值,则答案要比那复杂得多。
答案 1 :(得分:0)
Perl(和类似语言)中的技巧是哈希,它不关心顺序。
假设第一个数组是保存有效元素的数组。构造一个散列,将这些值作为键:
my @valid = qw( one two ... );
my %valid = map { $_, 1 } @valid;
现在,要查找无效元素,您只需找到不在%valid
哈希中的元素:
my @invalid = grep { ! exists $valid{$_} } @array;
如果你想知道无效元素的数组索引:
my @invalid_indices = grep { ! exists $valid{$_} } 0 .. $#array;
现在,您可以扩展它以查找重复的元素。您不仅要检查%valid
哈希,还要跟踪您已经看到的内容:
my %Seen;
my @invalid_indices = grep { ! exists $valid{$_} && ! $Seen{$_}++ } 0 .. $#array;
重复的有效元素是%Seen
中值大于1的元素:
my @repeated_valid = grep { $Seen{$_} > 1 } @valid;
要查找缺少的元素,请查看%Seen
以查看其中没有的内容。
my @missing = grep { ! $Seen{$_ } } @valid;
答案 2 :(得分:0)
从perlfaq4回答How can I tell whether a certain element is contained in a list or array?:
(这个答案的部分内容由Anno Siegel和brian d foy提供)
听到“in”这个词表示您可能应该使用散列而不是列表或数组来存储数据。哈希旨在快速有效地回答这个问题。数组不是。
话虽如此,有几种方法可以解决这个问题。在Perl 5.10及更高版本中,您可以使用智能匹配运算符来检查项目是否包含在数组或哈希中:
use 5.010;
if( $item ~~ @array )
{
say "The array contains $item"
}
if( $item ~~ %hash )
{
say "The hash contains $item"
}
使用早期版本的Perl,您需要做更多工作。如果要在任意字符串值上多次进行此查询,最快的方法可能是反转原始数组并维护一个哈希,其键是第一个数组的值:
@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (@blues) { $is_blue{$_} = 1 }
现在你可以检查$ is_blue {$ some_color}。首先将蓝调全部放在哈希中可能是一个好主意。
如果值都是小整数,则可以使用简单的索引数组。这种数组占用的空间更少:
@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
@is_tiny_prime = ();
for (@primes) { $is_tiny_prime[$_] = 1 }
# or simply @istiny_prime[@primes] = (1) x @primes;
现在检查$ is_tiny_prime [$ some_number]。
如果有问题的值是整数而不是字符串,则可以使用位字符串来节省相当多的空间:
@articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (@articles) { vec($read,$_,1) = 1 }
现在检查一些$ n的vec($ read,$ n,1)是否为真。
这些方法可以保证快速的单独测试,但需要重新组织原始列表或数组。如果你必须针对同一个数组测试多个值,它们只会得到回报。
如果您只测试一次,标准模块List :: Util会首先导出该功能。它一旦找到元素就停止工作。它是用C语言编写的,它的Perl等价物就像这个子程序:
sub first (&@) {
my $code = shift;
foreach (@_) {
return $_ if &{$code}();
}
undef;
}
如果速度很少受到关注,那么常见的习惯用法在标量上下文中使用grep(它返回通过其条件的项目数)来遍历整个列表。这确实有利于告诉你它找到了多少匹配。
my $is_there = grep $_ eq $whatever, @array;
如果要实际提取匹配元素,只需在列表上下文中使用grep。
my @matches = grep $_ eq $whatever, @array;
答案 3 :(得分:0)
sub compare {
local @d = ();
my $this = shift;
my $that = shift;
my $distance = _levenshteindistance($this, $that);
my @thisorig = @{ $this };
my @thatorig = @{ $that };
my $s = $#thisorig;
my $t = $#thatorig;
@this = ();
@that = ();
@fail = ();
while($s>0 || $t>0) {
# deletion, insertion, substitution
my $min = _minimum($d[$s-1][$t],$d[$s][$t-1],$d[$s-1][$t-1]);
if($min == $d[$s-1][$t-1]) {
unshift(@this,$thisorig[$s]);
unshift(@that,$thatorig[$t]);
if($d[$s][$t] > $d[$s-1][$t-1]) {
unshift(@fail,'INVALID');
} else {
unshift(@fail,'NO_FAIL');
}
$s -= 1;
$t -= 1;
} elsif($min == $d[$s][$t-1]) {
unshift(@this,'*');
unshift(@that,$thatorig[$t]);
unshift(@fail,'EXTRA');
$t -= 1;
} elsif($min == $d[$s-1][$t]) {
unshift(@this,$thisorig[$s]);
unshift(@that,'*');
unshift(@fail,'MISSING');
$s -= 1;
} else {
die("Error! $!");
}
}
return(\@this, \@that, \@fail);
}
sub _minimum {
my $ret = 2**53;
foreach $in (@_) {
$ret = $ret < $in ? $ret : $in;
}
$ret;
}
sub _levenshteindistance {
my $s = shift;
my $t = shift;
my @s = @{ $s };
my @t = @{ $t };
for(my $i=0;$i<scalar(@s);$i++) {
$d[$i] = ();
}
for(my $i=0;$i<scalar(@s);$i++) {
$d[$i][0] = $i # deletion
}
for(my $j=0;$j<scalar(@t);$j++) {
$d[0][$j] = $j # insertion
}
for(my $j=1;$j<scalar(@t);$j++) {
for(my $i=1;$i<scalar(@s);$i++) {
if ($s[$i] eq $t[$j]) {
$d[$i][$j] = $d[$i-1][$j-1];
} else {
# deletion, insertion, substitution
$d[$i][$j] = _minimum($d[$i-1][$j]+1,$d[$i][$j-1]+1,$d[$i-1][$j-1]+1);
}
}
}
foreach $a (@d) {
@a = @{ $a };
foreach $b (@a) {
printf STDERR "%2d ",$b;
}
print STDERR "\n";
}
return $d[$#s][$#t];
}
答案 4 :(得分:-1)
从perlfaq4回答How do I compute the difference of two arrays? How do I compute the intersection of two arrays?:
使用哈希。这是两个以上的代码。它假定每个元素在给定数组中是唯一的:
@union = @intersection = @difference = ();
%count = ();
foreach $element (@array1, @array2) { $count{$element}++ }
foreach $element (keys %count) {
push @union, $element;
push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;
}
请注意,这是对称差异,即A或B中的所有元素,但两者都不是。把它想象成一个xor操作。