目标:实施一种算法,在给定字符串
a
和b
的情况下,返回包含a
所有字符的b
的最短子字符串。字符串b
可以包含重复项。
在链接的文章中,算法只找到最短子串的长度,但这是一个微小的变化。
这是我的实施:
导入集合
def issubset(c1, c2):
'''Return True if c1 is a subset of c2, False otherwise.'''
return not c1 - (c1 & c2)
def min_idx(seq, target):
'''Least index of seq such that seq[idx] is contained in target.'''
for idx, elem in enumerate(seq):
if elem in target:
return idx
def minsub(a, b):
target_hist = collections.Counter(b)
current_hist = collections.Counter()
# Skip all the useless characters
idx = min_idx(a, target_hist)
if idx is None:
return []
a = a[idx:]
# Build a base substring
i = iter(a)
current = []
while not issubset(target_hist, current_hist):
t = next(i)
current.append(t)
current_hist[t] += 1
minlen = len(current)
shortest = current
for t in i:
current.append(t)
# Shorten the substring from the front as much as possible
if t == current[0]:
idx = min_idx(current[1:], target_hist) + 1
current = current[idx:]
if len(current) < minlen:
minlen = len(current)
shortest = current
return current
不幸的是,它不起作用。例如,
>>> minsub('this is a test string', 'tist')
['s', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't', ' ', 's', 't', 'r', 'i', 'n', 'g'
我缺少什么?
旁注:我不确定我的实现是否为O(n),但这是一个不同的问题。至于现在,我正在寻找修复我的实现。
编辑:看似有效的解决方案:
import collections
def issubset(c1, c2):
'''Return True if c1 is a subset of c2, False otherwise.'''
return not c1 - (c1 & c2)
def min_idx(seq, target):
'''Least index of seq such that seq[idx] is contained in target.'''
for idx, elem in enumerate(seq):
if elem in target:
return idx
def minsub(a, b):
target_hist = collections.Counter(b)
current_hist = collections.Counter()
# Skip all the useless characters
idx = min_idx(a, target_hist)
if idx is None:
return []
a = a[idx:]
# Build a base substring
i = iter(a)
current = []
while not issubset(target_hist, current_hist):
t = next(i)
current.append(t)
current_hist[t] += 1
minlen = len(current)
shortest = current[:]
for t in i:
current.append(t)
# Shorten the substring from the front as much as possible
if t == current[0]:
current_hist = collections.Counter(current)
for idx, elem in enumerate(current[1:], 1):
if not current_hist[elem] - target_hist[elem]:
break
current_hist[elem] -= 1
current = current[idx:]
if len(current) < minlen:
minlen = len(current)
shortest = current[:]
return shortest
答案 0 :(得分:1)
问题在于此步骤,当我们向current
添加一个字符并且它与第一个字符匹配时:
删除最左边的字符以及最左边的字符后的所有其他额外字符。
此idx
idx = min_idx(current[1:], target_hist) + 1
有时低于预期:只要idx
是current_hist
的子集,target_hist
就会增加。因此,我们需要让current_hist
保持最新状态,以便为idx
计算正确的值。另外,minsub
应该返回shortest
而不是current
。
def minsub(a, b):
target_hist = collections.Counter(b)
current_hist = collections.Counter()
# Skip all the useless characters
idx = min_idx(a, target_hist)
if idx is None:
return []
a = a[idx:]
# Build a base substring
i = iter(a)
current = []
while not issubset(target_hist, current_hist):
t = next(i)
current.append(t)
if t in target_hist:
current_hist[t] += 1
minlen = len(current)
shortest = current
#current = []
for t in i:
current.append(t)
current_hist[t] += 1
# Shorten the substring from the front as much as possible
if t == current[0]:
#idx = min_idx(current[1:], target_hist) + 1
idx = 0
while issubset(target_hist, current_hist):
u = current[idx]
current_hist[u] -= 1
idx += 1
idx -= 1
u = current[idx]
current_hist[u] += 1
current = current[idx:]
if len(current) < minlen:
minlen = len(current)
shortest = current[:]
return shortest
In [9]: minsub('this is a test string', 'tist')
Out[9]: ['t', ' ', 's', 't', 'r', 'i']