给定字符串S[0..N-1]
,我希望能够在S[i..j]
时间内获取其任何子字符串O(1)
的哈希值,并进行O(N)
预处理。这是我到目前为止(在Python中):
BASE = 31
word = "abcxxabc"
N = len(word)
powers = []
prefix_hashes = []
b = 1
h = 0
for i in range(N):
powers.append(b)
b *= BASE
h += (ord(word[i]) - ord('a') + 1)
h *= BASE
prefix_hashes.append(h)
def get_hash(i, j):
if i == 0:
return prefix_hashes[j]
return prefix_hashes[j] - prefix_hashes[i - 1] * powers[j - i + 1]
它工作得很好......在Python中,我不需要担心可能的整数溢出。我希望能够执行上述操作(有效地重写整个算法),但模拟一些大素数,以便我适合32位整数运算。这就是我想出的:
MOD = 10**9 + 7
BASE = 31
word = "abcxxabc"
N = len(word)
powers = []
prefix_hashes = []
b = 1
h = 0
for i in range(N):
powers.append(b)
b = (b * BASE) % MOD
h += (ord(word[i]) - ord('a') + 1)
h = (h * BASE) % MOD
prefix_hashes.append(h)
def get_hash(i, j):
if i == 0:
return prefix_hashes[j]
return (prefix_hashes[j] - prefix_hashes[i - 1] * powers[j - i + 1]) % MOD
但是
prefix_hashes[i - 1] * powers_of_base[j - i + 1]
对于较大的MOD
值,部分可以很容易地溢出。怎么去呢?