您好我正在编写一个PHP类来实现Rabin-Karp算法。我有重新散列部分的问题。此代码不包含匹配的部分字符。由于重新散列的问题,我不得不停止,因为它从不匹配哈希码。有人请帮我搞清楚。
<?php
class RabinKarp
{
/**
* @var String
*/
private $pattern ;
private $patternHash ;
private $text ;
private $previousHash ;
/**
* @var Integer
*/
private $radix ;
private $prime ;
private $position ;
/**
* Constructor
*
* @param String $pattern - The pattern
*
*/
public function __construct($pattern)
{
$this->pattern = $pattern;
$this->radix = 256;
$this->prime = 100007;
$this->previousHash = "";
$this->position = 0;
$this->patternHash = $this->generateHash($pattern);
}
private function generateHash($key)
{
$charArray = str_split($key);
$hash = 0;
foreach($charArray as $char)
{
$hash = ($this->radix * $hash + ord($char)) % $this->prime;
}
return $hash;
}
public function search($character)
{
$this->text .= $character;
if(strlen($this->text) < strlen($this->pattern))
{
return false;
}
else
{
$txtHash = 0;
echo $this->previousHash . "<br/>";
if(empty($this->previousHash))
{
$txtHash = $this->generateHash($this->text);
$this->previousHash = $txtHash;
$this->position = 0;
}
else
{
// The issue is here
$charArray = str_split($this->text);
$txtHash = (($txtHash + $this->prime) - $this->radix * strlen($this->pattern) * ord($charArray[$this->position]) % $this->prime) % $this->prime;
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime;
$this->previousHash = $txtHash;
}
if($txtHash == $this->patternHash)
{
echo "Hash Match found";
}
}
}
}
$x = new RabinKarp("ABC");
$x->search("Z");
$x->search("A");
$x->search("B");
$x->search("C");
?>
谢谢。
答案 0 :(得分:1)
您要删除的字符(c
的简称)对哈希的贡献是
ord(c) * radix^(length(pattern)-1)
因为角色在首次进入匹配窗口时会提供ord(c)
,并且对于输入匹配的每个radix
字符,散列 - 因此也是其贡献 - 乘以length(pattern)-1
窗口直到c
最终离开它。
但是你要减去ord(c) * radix * length(pattern)
$charArray = str_split($this->text);
$txtHash = (($txtHash + $this->prime)
- $this->radix * strlen($this->pattern)
* ord($charArray[$this->position]) % $this->prime)
% $this->prime;
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime;
此外,在计算中,您使用的变量$txtHash
已设置为0,应为$this->previousHash
,并且您必须增加文本位置。
原则上,
$charArray = str_split($this->text);
$txtHash = (($this->previousHash + $this->prime)
- pow($this->radix, strlen($this->pattern)-1)
* ord($charArray[$this->position]) % $this->prime)
% $this->prime;
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime;
$this->previousHash = $txtHash;
$this->position += 1;
是你必须要做的。
但除非模式非常短,pow($this->radix,strlen($this->pattern)-1)
将溢出,因此您必须使用模幂运算函数替换pow($this-radix, strlen($this->pattern)-1)
function mod_pow($base,$exponent,$modulus)
{
$aux = 1;
while($exponent > 0) {
if ($exponent % 2 == 1) {
$aux = ($aux * $base) % $modulus;
}
$base = ($base * $base) % $modulus;
$exponent = $exponent/2;
}
return $aux;
}
(如果此$modulus
$this->prime
太大,这仍然可能会溢出。相关的代码行变为
$txtHash = (($this->previousHash + $this->prime)
- mod_pow($this->radix, strlen($this->pattern)-1, $this->prime)
* ord($charArray[$this->position]) % $this->prime)
% $this->prime;
然后你可能存在巨大的低效率
$this->text .= $character;
...
$charArray = str_split($this->text);
如果字符串变长,连接和拆分可能会花费很多时间(不确定PHP如何实现字符串和字符串操作,但它们可能不是恒定的时间)。您应该只保留字符串的相关部分,即在重新计算哈希值后删除第一个字符。