Rabin-Karp常时子串哈希查找

时间:2017-12-09 18:29:43

标签: algorithm hash modulo integer-overflow string-search

给定字符串S[0..N-1],我希望能够在S[i..j]时间内获取其任何子字符串O(1)的哈希值,并进行O(N)预处理。这是我到目前为止(在Python中):

BASE = 31

word = "abcxxabc"
N = len(word)

powers = []
prefix_hashes = []

b = 1
h = 0
for i in range(N):
    powers.append(b)
    b *= BASE

    h += (ord(word[i]) - ord('a') + 1)
    h *= BASE
    prefix_hashes.append(h)

def get_hash(i, j):
    if i == 0:
        return prefix_hashes[j]
    return prefix_hashes[j] - prefix_hashes[i - 1] * powers[j - i + 1]

它工作得很好......在Python中,我不需要担心可能的整数溢出。我希望能够执行上述操作(有效地重写整个算法),但模拟一些大素数,以便我适合32位整数运算。这就是我想出的:

MOD = 10**9 + 7
BASE = 31

word = "abcxxabc"
N = len(word)

powers = []
prefix_hashes = []

b = 1
h = 0
for i in range(N):
    powers.append(b)
    b = (b * BASE) % MOD

    h += (ord(word[i]) - ord('a') + 1)
    h = (h * BASE) % MOD
    prefix_hashes.append(h)

def get_hash(i, j):
    if i == 0:
        return prefix_hashes[j]
    return (prefix_hashes[j] - prefix_hashes[i - 1] * powers[j - i + 1]) % MOD

但是

prefix_hashes[i - 1] * powers_of_base[j - i + 1]
对于较大的MOD值,

部分可以很容易地溢出。怎么去呢?

0 个答案:

没有答案