Question

if (%hash){
     print "That was a true value!\n";
}
如果（并且仅当）哈希至少有一个哈希，那将是真的     核心价值     对

实际结果是一个对内部有用的内部调试字符串     人     谁维护Perl。 看起来像“4/16”，，但价值     当哈希是非空的时，保证是真的，而当哈希是非空时，保证是真的     它是空的。 - 小册子

4/16是什么？谁能告诉我一个小程序，我可以看到结果是4/16？

Answer 1

来自perldoc perldata：

如果在标量上下文中计算哈希值，则在哈希值时返回false 是空的。如果有任何键/值对，则返回true;更多确切地说，返回的值是一个由数字组成的字符串用桶和分配的桶数，用a分隔削减。这只是为了找出Perl是否有用内部哈希算法在您的数据集上表现不佳。对于例如，你在哈希中粘贴10,000个东西，但是在中评估％HASH 标量上下文显示“1/16”，这意味着只有十六分之一水桶已被触及，可能包含你的所有10,000 项目

所以，4/16将是使用/分配计数的桶，类似下面的内容将显示此值：

%hash = (1, 2);
print scalar(%hash); #prints 1/8 here

Answer 2

哈希是链表的数组。散列函数将密钥转换为一个数字，该数字用作存储值的数组元素（“bucket”）的索引。多个密钥可以散列到相同的索引（“冲突”），这是由链表处理的情况。

分数的分母是桶的总数。

分数的分子是具有一个或多个元素的桶的数量。

对于具有相同元素数量的哈希值，数字越大越好。返回6/8的碰撞比返回4/8的碰撞少。

Answer 3

这是我发送到Perl初学者邮件列表的电子邮件的略微修改版本，回答了同样的问题。

说

my $hash_info = %hash;

将获得0（如果哈希值为空）或者习惯比率总桶数。这些信息几乎（但不完全）对你没用。要明白这意味着你必须首先了解哈希是如何工作的。

让我们使用Perl 5实现一个哈希。我们首先需要的是一个散列函数。哈希函数将字符串转换为希望，独特的数字。真正的强哈希函数的例子是 MD5或SHA1，但它们往往太慢而不适合常用，所以人们倾向于使用较弱的（即产生较少独特输出的那些）哈希表的函数。 Perl 5使用Bob Jenkins [一次一个] 算法，它具有对速度唯一性的良好折衷。对于我们例如，我将使用一个非常弱的散列函数：

#!/usr/bin/perl

use strict;
use warnings;

sub weak_hash {
       my $key  = shift;
       my $hash = 1;
       #multiply every character in the string's ASCII/Unicode value together
       for my $character (split //, $key) {
               $hash *= ord $character;
       }
       return $hash;
}

for my $string (qw/cat dog hat/) {
       print "$string hashes to ", weak_hash($string), "\n";
}

因为散列函数倾向于返回大于我们想要的范围的数字，所以通常使用modulo来减少它给出的数字范围回：

#!/usr/bin/perl

use strict;
use warnings;

sub weak_hash {
       my $key  = shift;
       my $hash = 1;
       #multiply every character in the string's ASCII/Unicode value together
       for my $character (split //, $key) {
               $hash *= ord $character;
       }
       return $hash;
}

for my $string (qw/cat dog hat/) {
       # the % operator is constraining the number
       # weak_hash returns to 0 - 10
       print "$string hashes to ", weak_hash($string) % 11, "\n";
}

现在我们有一个散列函数，我们需要一个地方来保存密钥和价值。这称为哈希表。哈希表通常是一个其元素称为桶的数组（这些是桶这个比例正在谈论）。存储桶将保存所有键/值散列到相同数字的对：

#!/usr/bin/perl

use strict;
use warnings;

sub weak_hash {
       my $key  = shift;
       my $hash = 1;
       for my $character (split //, $key) {
               $hash *= ord $character;
       }
       return $hash;
}

sub create {
       my ($size) = @_;

       my @hash_table;

       #set the size of the array
       $#hash_table = $size - 1;

       return \@hash_table;
}


sub store {
       my ($hash_table, $key, $value) = @_;

       #create an index into $hash_table
       #constrain it to the size of the hash_table
       my $hash_table_size = @$hash_table;
       my $index           = weak_hash($key) % $hash_table_size;

       #push the key/value pair onto the bucket at the index
       push @{$hash_table->[$index]}, {
               key   => $key,
               value => $value
       };

       return $value;
}

sub retrieve {
       my ($hash_table, $key) = @_;

       #create an index into $hash_table
       #constrain it to the size of the hash_table
       my $hash_table_size = @$hash_table;
       my $index           = weak_hash($key) % $hash_table_size;

       #get the bucket for this key/value pair
       my $bucket = $hash_table->[$index];

       #find the key/value pair in the bucket
       for my $pair (@$bucket) {
               return $pair->{value} if $pair->{key} eq $key;
       }

       #if key isn't in the bucket:
       return undef;
}

sub list_keys {
       my ($hash_table) = @_;

       my @keys;

       for my $bucket (@$hash_table) {
               for my $pair (@$bucket) {
                       push @keys, $pair->{key};
               }
       }

       return @keys;
}

sub print_hash_table {
       my ($hash_table) = @_;

       for my $i (0 .. $#$hash_table) {
               print "in bucket $i:\n";
               for my $pair (@{$hash_table->[$i]}) {
                       print "$pair->{key} => $pair->{value}\n";
               }
       }
}

my $hash_table = create(3);

my $i = 0;
for my $key (qw/a b c d g j/) {
       store($hash_table, $key, $i++);
}
print_hash_table($hash_table);

print "the a key holds: ", retrieve($hash_table, "a"), "\n";

从这个例子我们可以看出，一个桶可能有比其他键更多的键/值对。这是一个糟糕的情况 in。它导致该桶的哈希缓慢。这是其中之一使用比率为哈希返回的总桶数标量上下文。如果哈希说只有几个桶使用过，但是哈希中有很多键，那么你知道你有一个问题

要了解有关哈希的更多信息，请在此处提出有关我所说的内容的问题，或read about them。

Answer 4

添加另一个答案，因为第一个答案已经太长了。

查看"4/16"意味着什么的另一种方法是使用Hash::Esoteric模块（警告alpha质量代码）。我编写它是为了让我更好地了解哈希内部的内容，以便我可以尝试理解大型哈希似乎有的performance problem。来自keys_by_bucket的{{1}}函数将返回哈希中的所有键，但不会将其作为Hash::Esoteric之类的列表返回，而是将其作为AoA返回顶层表示存储桶，其中的arrayref保存该存储桶中的密钥。

keys

上面的代码打印出类似的东西（实际数据取决于Perl的版本）：

#!/user/bin/env perl

use strict;
use warnings;

use Hash::Esoteric qw/keys_by_bucket/;

my %hash = map { $_ => undef } "a" .. "g";
my $buckets = keys_by_bucket \%hash;

my $used;
for my $i (0 .. $#$buckets) {
    if (@{$buckets->[$i]}) {
        $used++;
    }
    print "bucket $i\n";
    for my $key (@{$buckets->[$i]}) {
        print "\t$key\n";
    }
}

print "scalar %hash: ", scalar %hash, "\n",
      "used/total buckets: $used/", scalar @$buckets, "\n";

Answer 5

分数是散列的填充率：使用的桶与分配的桶。有时也称为加载因子。

要实际获得“4/16”，你需要一些技巧。 4个键将导致8个桶。因此，您至少需要9个键，然后删除5。

$ perl -le'%h=(0..16); print scalar %h; delete $h{$_} for 0..8; print scalar %h'
9/16
4/16

请注意，由于种子是随机的，您的数字会有所不同，而且您无法预测确切的碰撞

填充率是重新散列时的关键哈希信息。 Perl 5以100％的填充率重新开始，请参阅DO_HSPLIT中的hv.c宏。因此，它以只读速度交换内存。正常填充率将介于80％-95％之间。你总是留下洞来保存一些碰撞。较低的填充率会导致更快的访问（较少的冲突），但更多的重新访问。

您不会立即看到与分数发生碰撞的次数。您还需要keys %hash来比较分数的分子，使用的桶数。

因此，碰撞质量的一部分是键/使用的桶：

my ($used, $max) = split '/',scalar(%hash);
keys %hash / $used;

但实际上你需要知道桶中所有链表的长度总和。您可以使用Hash::Util::bucket_info

访问此质量

($keys, $buckets, $used, @length_count)= Hash::Util::bucket_info(\%hash)

虽然散列访问通常是O（1），但长度很长只有O（n / 2），尤其是。对于超长的水桶。在https://github.com/rurban/perl-hash-stats，我提供了perl5核心测试套件数据的各种散列函数的碰撞质量的统计信息。我还没有测试不同填充率的权衡，因为我正在完全重写当前的哈希表。

更新：对于perl5，如最近测试的那样，比100％更好的填充率将是90％。但这取决于使用的哈希函数。我用了一个又好又快的：FNV1A。使用更好，更慢的散列函数，您可以使用更高的填充率。当前的默认OOAT_HARD是坏的和慢的，所以应该避免。

Answer 6

(%hash)评估标量上下文中的哈希值。

这是一个空哈希：

command_line_prompt> perl -le '%hash=(); print scalar %hash;'

结果是0。

这是一个非空的哈希：

command_line_prompt> perl -le '%hash=(foo=>'bar'); print scalar %hash;'

结果是字符串“1/8”。

什么是4/16的哈希？

6 个答案: