我一直在为自己的教化而研究这个问题,我似乎无法破解它:
给定两个字符串,这些字符串是相同字符超集的排列,找到对齐它们所需的最小交换次数。字符串是循环的(即您可以交换第一个和最后一个字符),最长可达2000个字符,并且可以对任一字符串执行交换。
我尝试了几种不同的方法。一种方法是对两个字符串进行冒泡排序并缓存其所有中间状态,然后找到两者共有的中间状态,以最小化每个字符串达到该中间状态的步数之和。这没有产生正确的数字(我有一些正确答案的例子),它显然不适用于非常大的字符串。我现在有点难过了。我想我可能需要使用Demerau-Levenshtein距离的修改版本,但是当只允许一种类型的操作时,如何选择最小成本操作?
有人能指出我正确的方向吗?
答案 0 :(得分:1)
考虑基于A *算法的动态编程解决方案。我们的想法是将此问题建模为图形,并找到目标节点的最佳路径。
首先进行一些基本的澄清。图中的每个节点都是一个字符串,当且仅当它们因相邻字符的一次(循环)交换而不同时,两个节点才是邻居。起始节点是第一个字符串,目标节点是第二个字符串。找到从开始到目标的最短路径,可以为您提供最佳解决方案。
现在讨论。 A *实现非常通用,这里唯一感兴趣的是启发式函数的选择。我已经实现了两个启发式,一个已知是可接受的,因此A *保证找到最佳路径。我怀疑我实施的另一种启发式应该是可以接受的。我所做的所有经验性尝试都表明它是不可接受的,这种尝试都失败了,这使我对它可以接受的信心增强了。第一个启发式算法非常慢,它似乎扩展了具有字符串大小的指数数量的节点。第二种启发式似乎扩展了一个更合理的数字(可能是字符串大小的多项式。)
然而,这是一个非常有趣的问题,一旦我有更多的时间,我可能会尝试在理论上证明第二种启发式是合理的。
即使是better_heuristic似乎也扩展了字符串大小的指数节点,这对于这两种启发式方法都不是好兆头。
下面的代码。请随意提出自己的启发式方法并试用它们。这些是我首先想到的。
#!/usr/bin/python
import random
import string
import copy
from Queue import PriorityQueue as priority_queue
# SWAP AND COPY UTILITY FUNCTION
def swap(c,i,j):
if j >= len(c):
j = j % len(c)
c = list(c)
c[i], c[j] = c[j], c[i]
return ''.join(c)
# GIVEN THE GOAL y AND THE CURRENT STATE x COMPUTE THE HEURISTIC DISTANCE
# AS THE MAXIMUM OVER THE MINIMUM NUMBER OF SWAPS FOR EACH CHARACTER TO GET
# TO ITS GOAL POSITION.
# NOTE: THIS HEURISTIC IS GUARANTEED TO BE AN ADMISSIBLE HEURISTIC, THEREFORE
# A* WILL ALWAYS FIND THE OPTIMAL SOLUTION USING THIS HEURISITC. IT IS HOWEVER
# NOT A STRONG HEURISTIC
def terrible_heuristic(x, y):
lut = {}
for i in range(len(y)):
c = y[i]
if lut.has_key(c):
lut[c].append(i)
else:
lut[c] = [i]
longest_swaps = []
for i in range(len(x)):
cpos = lut[x[i]]
longest_swaps.append(min([ min((i-cpos[j])%len(x),(cpos[j]-i)%len(x)) for j in range(len(cpos)) ]))
return max(longest_swaps)-1
# GIVEN THE GOAL y AND THE CURRENT STATE x COMPUTE THE HEURISTIC DISTANCE
# AS THE SUM OVER THE MINIMUM NUMBER OF SWAPS FOR EACH CHARACTER TO GET
# TO ITS GOAL POSITION DIVIDED BY SOME CONSTANT. THE LOWER THE CONSTANT
# THE FASTER A* COMPUTES THE SOLUTION,
# NOTE: THIS HEURISTIC IS CURRENTLY NOT THEORETICALLY JUSTIFIED. A PROOF
# SHOULD BE FORMULATED AND THEORETICAL WORK SHOULD BE DONE TO DISCOVER
# WHAT IS THE MINIMAL CONSTANT ALLOWED FOR THIS HEURISTIC TO BE ADMISSIBLE.
def better_heuristic(x, y):
lut = {}
for i in range(len(y)):
c = y[i]
if lut.has_key(c):
lut[c].append(i)
else:
lut[c] = [i]
longest_swaps = []
for i in range(len(x)):
cpos = lut[x[i]]
longest_swaps.append(min([ min((i-cpos[j])%len(x),(cpos[j]-i)%len(x)) for j in range(len(cpos)) ]))
d = 0.
for x in longest_swaps:
d += x-1
# THE CONSTANT TO DIVIDE THE SUM OF THE MINIMUM SWAPS, 1.5 SEEMS TO BE THE LOWEST
# ONE CAN CHOOSE BEFORE A* NO LONGER RETURNS CORRECT SOLUTIONS
constant = 1.5
d /= constant
return d
# GET ALL STRINGS ONE CAN FORM BY SWAPPING TWO CHARACTERS ONLY
def ngbs(x):
n = set() # WE USE SET FOR THE PATHOLOGICAL CASE OF WHEN len(x) = 2
for i in xrange(len(x)):
n.add(swap(x,i,i+1))
return n
# CONVENIENCE WRAPPER AROUND PYTHON's priority_queue
class sane_priority_queue(priority_queue):
def __init__(self):
priority_queue.__init__(self)
self.counter = 0
def put(self, item, priority):
priority_queue.put(self, (priority, self.counter, item))
self.counter += 1
def get(self, *args, **kwargs):
_, _, item = priority_queue.get(self, *args, **kwargs)
return item
# AN A* IMPLEMENTATION THAT USES EXPANDING DATA-TYPES BECAUSE OUR FULL SEARCH
# SPACE COULD BE MASSIVE. HEURISTIC FUNCTION CAN BE SPECIFIED AT RUNTIME.
def a_star(x0,goal,heuristic_func=terrible_heuristic):
visited = set()
frontier_visited = set()
frontier = sane_priority_queue()
distances = {}
predecessors = {}
predecessors[x0] = x0
distances[x0] = 0
frontier.put(x0,heuristic_func(x0,goal))
while not frontier.empty():
current = frontier.get()
if current == goal:
print "goal found, distance: ", distances[current], ' nodes explored: ', len(visited)
return predecessors, distances
visited.add(current)
for n in ngbs(current):
if n in visited:
continue
tentative_distance = distances[current] + 1
if not distances.has_key(n) or tentative_distance < distances[n]:
predecessors[n] = current
distances[n] = tentative_distance
heuristic_distance = tentative_distance + heuristic_func(n,goal)
frontier.put(n,heuristic_distance)
# SIZE OF STRINGS TO WORK WITH
n = 10
# GENERATE RANDOM STRING
str1 = ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(n)])
# RANDOMLY SHUFFLE
str2 = copy.deepcopy(str1)
l = list(str2)
random.shuffle(l)
str2 = ''.join(l)
# PRINT THE STRING FOR VISUAL DISPLAY
print 'str1', str1
print 'str2', str2
# RUN A* WITH THE TERRIBLE HEURISITIC FOR A KNOWN OPTIMAL SOLUTION
print 'a_star with terrible_heuristic:'
predecessors, distances = a_star(str1,str2,terrible_heuristic)
current = str2
while current != predecessors[current]:
print current
current = predecessors[current]
print str1
# RUN A* WITH A BETTER HEURISTIC THAT IS NOT JUSTIFIED THEORETICALLY
# TO BE ADMISSIBLE. THE PURPORSE IS TO COMPARE AGAINST THE KNOWN
# ADMISSIBLE HEURISTIC TO SEE EMPIRICALLY WHAT THE LOWEST WE CAN
# GO IS.
print 'a_star with better_heuristic:'
predecessors, distances = a_star(str1,str2,better_heuristic)
current = str2
while current != predecessors[current]:
print current
current = predecessors[current]
print str1