Question

嘿。我知道这不是一个'重构我的代码'网站，但是我制作了这个小代码，它适用于中等大小的输入，但是它的大小字符串，例如超过2000个就有问题。

它的作用 - 它将一串数字作为参数，并返回可以解释为字母串的方式的数量，其中英文字母表中的每个字母根据其词汇分配一个数值位置：A - ＆gt; 1，B-> 2，Z-> 26等。

由于某些字母表示为两个数字，因此后缀树不是唯一的，因此可以有多种解释。例如，'111'可以是'AAA'，'KA'和'AK'。

这是我的代码。它相当可读和直接，但它有问题，因为：

每次都要复制部分字符串，将其作为递归部分的参数。
它必须在缓存中存储大量字符串，因此内存非常昂贵。
......这是递归的。

非常感谢：）

cache = dict()
def alpha_code(numbers):
    """
    Returns the number of ways a string of numbers
    can be interpreted as an alphabetic sequence.
    """
    global cache
    if numbers in cache: return cache[numbers]

    ## check the basic cases
    if numbers.startswith('0'): return 0
    if len(numbers) <= 1: return 1

    ## dynamic programming part

    ## obviously we can treat the first (non-zero)
    ## digit as a single letter and continue -
    ## '342...' -> C + '42...'
    total = alpha_code(numbers[1:])

    ## the first two digits make for a legal letter
    ## iff this condition holds
    ## '2511...' -> Y + '11...'
    ## '3711...' -> illegal
    if numbers[:2] <= '26':
        total += alpha_code(numbers[2:])

    cache[numbers] = total
    return total

Answer 1

尝试使用动态编程方法：

创建一个数组（称之为'P'），字符串中每个字符包含1个元素。
初始化P [0] = 1（除非第一个字符为0，在这种情况下只返回0表示结果）。
初始化P [1] = 2如果前两个字符可以被解释为字母，就像当前一样;否则为1，如果当前字符为非零，否则为结果返回0）。
通过以下规则（伪代码）从左到右填充数组的其余部分：

P [x] = （如果当前字符为'0'则 0，其他 P [x-1]） + （如果以前的字符+当前字符可以解释为字母然后 P [x-2] 其他 0）

（注意，如果P [x]为0，则应返回零，因为这意味着您的规则似乎不允许存在两个0。）

总和的第一部分是处理当前字符被解释为字母的情况;总和的第二部分是处理2个最近字符被解释为字母的情况。

基本上，P [x]将等于从开始到位置x 的整个字符串可以被解释为字母的方式的数量。既然您可以通过查看以前的结果来确定这一点，那么您只需要遍历字符串的内容一次 - 一个O（N）时间而不是O（2 ^N），这是一个巨大的进步。你的最终结果只是P [len（输入）-1]，因为“从开始到结束的所有内容”与“整个字符串”相同。

示例运行您的基本输入案例'111'：

P [0] = 1（因为1非零）

P [1] = 2（因为11是有效字母，1也是有效字母）

P [2] = 3（因为最近两个字符一起是有效字母，而且当前字符非零，所以P [0] + P [1] = 1 + 2 = 3）

由于P [2]是我们的最后一个结果，而且它是3，我们的答案是3。

如果字符串是'1111'，我们将继续另一步：

P [3] = 5（由于最近两个字符是有效字母，当前字符不为零，所以P [1] + P [2] = 2 + 3 = 5）

答案确实是5 - 有效的解释是AAAA，KK，AKA，AAK，KAA。请注意这5个潜在答案是如何从“11”和“111”的潜在解释构建的：

'11'：AA或K. '111'：AAA或KA或AK

'111'+ A：AAA + A或KA + A或AK + A. '11'+ K：AA + K或K + K

Answer 2

递归消除总是一项有趣的任务。在这里，我将专注于确保正确填充缓存，然后使用它，如下所示......：

import collections

def alpha_code(numbers):
    # populate cache with all needed pieces
    cache = dict()
    pending_work = collections.deque([numbers])
    while pending_work:
      work = pending_work.popleft()
      # if cache[work] is known or easy, just go for it
      if work in cache:
        continue
      if work[:1] == '0':
        cache[work] = 0
        continue
      elif len(work) <= 1:
        cache[work] = 1
        continue
      # are there missing pieces? If so queue up the pieces
      # on the left (shorter first), the current work piece
      # on the right, and keep churning
      n1 = work[1:]
      t1 = cache.get(n1)
      if t1 is None:
        pending_work.appendleft(n1)
      if work[:2] <= '26':
        n2 = work[2:]
        t2 = cache.get(n2)
        if t2 is None:
          pending_work.appendleft(n2)
      else:
        t2 = 0
      if t1 is None or t2 is None:
        pending_work.append(work)
        continue
      # we have all pieces needed to add this one
      total = t1 + t2
      cache[work] = total

    # cache fully populated, so we know the answer
    return cache[numbers]

Answer 3

可以编写非递归算法，但我认为它不会更快。我不是python专家，所以我只给你一个算法：

Convert the array on numbers to an array of letters using just A thru I and leaving the zeros in place. 
Create two nested loops where you search and replace all the known pairs that represent larger letters. (AA -> K)

这个算法的优点是你可以通过首先搜索和索引数组中的所有As和B来优化搜索/替换。

由于您使用的是Python，无论您做什么，都应该将字符串转换为数字列表。数字0-9是Python中的静态对象，这意味着它们可以自由分配。您还可以创建A到Z的可重用字符对象。列表的另一个好处是删除两个元素的替换操作，插入单个元素比一遍又一遍地复制字符串要快得多。

Answer 4

你可以通过不复制字符串来大大减少内存占用，而是传递原始字符串和要研究的第一个字符的索引：

def alpha_code(numbers, start_from = 0)
    ....

然后你会以递归方式调用：

alpha_code(numbers, start_from + 1)  # or start_from + 2, etc.

这样，您可以保留递归算法的简单性并节省大量内存。

如何使这个代码更有效地处理大输入？

4 个答案: