Question

我正在编写一个程序来计算Python中的Levenshtein距离。我实现了memoization，因为我正在递归地运行算法。我的原始函数在函数本身中实现了memoization。这是它的样子：

# Memoization table mapping from a tuple of two strings to their Levenshtein distance
dp = {}

# Levenshtein distance algorithm
def lev(s, t):

  # If the strings are 0, return length of other
  if not s:
    return len(t)
  if not t:
    return len(s)

  # If the last two characters are the same, no cost. Otherwise, cost of 1
  if s[-1] is t[-1]:
    cost = 0
  else:
    cost = 1

  # Save in dictionary if never calculated before
  if not (s[:-1], t) in dp:
    dp[(s[:-1], t)] = lev(s[:-1], t)
  if not (s, t[:-1]) in dp:
    dp[(s, t[:-1])] = lev(s, t[:-1])
  if not (s[:-1], t[:-1]) in dp:
    dp[(s[:-1], t[:-1])] = lev(s[:-1], t[:-1])

  # Returns minimum chars to delete from s, t, and both
  return min(dp[(s[:-1], t)] + 1,
             dp[(s, t[:-1])] + 1,
             dp[(s[:-1], t[:-1])] + cost)

这个有效！但是，我找到了一种记忆using decorators的方法。我试图将这种技术应用到我的算法中：

# Memoization table mapping from a tuple of two strings to their Levenshtein distance
def memoize(func):
  memo = {}
  def wrap(s, t):
    if (s, t) not in memo:
      memo[(s, t)] = func(s, t)
    return memo[(s, t)]
  return wrap

# Levenshtein distance algorithm
@memoize # lev = memoize(lev)
def lev(s, t):

  # If the strings are 0, return length of other
  if not s:
    return len(t)
  if not t:
    return len(s)

  # If the last two characters are the same, no cost. Otherwise, cost of 1
  if s[-1] is t[-1]:
    cost = 0
  else:
    cost = 1

  # Returns minimum chars to delete from s, t, and both
  return min(lev(s[:-1], t) + 1,
             lev(s, t[:-1]) + 1,
             lev(s[:-1], t[:-1]) + cost)

对我来说，这看起来更干净，更少混乱。我认为两者在功能上是等价的，但是当我用装饰器运行版本时，我惊讶地发现我得到了RecursionError: maximum recursion depth exceeded。

我到底错过了什么？使用装饰器不是功能相同吗？我尝试通过添加sys.setrecursionlimit(1500)进行修复，但这是有效的，但它是一个黑客，并没有解释为什么这两个功能不同。

注意：我使用一段lorem ipsum作为维基百科中s和t的测试字符串：

Lorem ipsum dolor sit amet，consectetur adipiscing elit，sed do eiusmod tempor incididunt ut labore et dolore magna aliqua。 Ut enim ad minim veniam，quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat。 Duis aute irure dolor in repreptderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur。 Excepteur sint occaecat cupidatat non proident，sunt in culpa qui officia deserunt mollit anim id est labor。

我理解，对于更长的字符串，我的第一个函数将失败。我只是想知道为什么装饰的那个先失败了。谢谢！

Answer 1

考虑原始代码中发生的堆栈帧（函数调用）。它们看起来像：

lev(s, t)
-> lev(..., ...)
   -> lev(..., ...)
       -> lev(..., ...)
           -> lev(..., ...)

在你记忆的版本中，它们显示为：

wraps(s, t)
-> lev(..., ...)
   -> wraps(s, t)
      -> lev(..., ...)
          -> wraps(s, t)
             -> lev(..., ...)
                -> wraps(s, t)
                   -> lev(..., ...)
                      -> wraps(s, t)
                         -> lev(..., ...)

也就是说，你的堆栈框架将是每个＆＃34; call＆＃34;的两倍大。实际上调用了两个函数。因此，您将更早地耗尽堆栈帧限制。

Answer 2

这个看起来像一样无限递归问题，但事实并非如此。你只是非常深入地进行递归，装饰者会让它更深入。

不是直接调用您定义的lev函数，而是通过wrap和wrap调用lev进行调用。这使你的调用堆栈深两倍。如果你没有使用装饰器而你的输入变大了，你就会遇到这个问题。

要解决此问题，您可能必须切换到非递归程序结构，方法是使用自下而上的动态编程样式，或者通过将递归转换为迭代并手动维护堆栈。

Answer 3

试图理解你的代码，我做了一些修改。没什么大不了，只是偏好。

我只换了一行：

if s[-1] is t[-1]:

这个

if s[-1] == t[-1]:

原样，您的代码运行时没有任何递归问题

编辑使用您正在使用的整个字符串对其进行测试，我也遇到了递归限制问题。当然，它很深。

添加以下两行：

import sys
sys.setrecursionlimit(10000) 

def memoize(func):
  memo = {}
  def wrap(s, t):
    if (s, t) not in memo:
      memo[(s, t)] = func(s, t)
    return memo[(s, t)]
  return wrap

@memoize
def lev(s, t):
    len_s, len_t = len(s), len(t)
    if len_s==0: return len_t
    if len_t==0: return len_s
    cost = 0 if s[-1] == t[-1] else 1
    return min(lev(s[:-1], t) + 1,
               lev(s, t[:-1]) + 1,
               lev(s[:-1], t[:-1]) + cost)

s = "Lorem ibsum +++"
t = "Loren ipsum ..."
print(lev(s, t))             # 5

除此之外，因为您使用的是Python 3（我在问号标签中看到），您可以使用functools.lru_cache代替自定义memoize函数：

from functools import lru_cache

@lru_cache(maxsize=None)
def lev(s, t):
    ...
    ...

超出了最大递归深度，但仅限于使用装饰器时

3 个答案: