Question

在执行时，

for循环非常昂贵。我正在构建一个校正算法，我使用了彼得诺维格的拼写纠正码。我对它进行了一些修改，并意识到在数千个单词上执行优化需要很长时间。

算法检查1和2编辑距离并进行纠正。我做到了3。这样可能会增加时间（我不确定）。以下是最高出现的单词作为参考的结尾的一部分：

def correct(word):
    candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word]) # this is where the problem is

    candidate_new = []
    for candidate in candidates: #this statement isnt the problem
        if soundex(candidate) == soundex(word):
            candidate_new.append(candidate)
    return max(candidate_new, key=(NWORDS.get))

看起来语句for candidate in candidates正在增加执行时间。您可以轻松查看彼得诺威的代码，点击here 我已经找到了问题所在。它在声明中

candidates = (known([word]).union(known(edits1(word)))
             ).union(known_edits2(word).union(known_edits3(word)) or [word])

其中，

def known_edits3(word):
    return set(e3 for e1 in edits1(word) for e2 in edits1(e1) 
                                      for e3 in edits1(e2) if e3 in NWORDS)

可以看出edits3内有3个for循环，这使得执行时间增加了3倍。 edits2有2个for循环。所以这就是罪魁祸首。

如何最小化此表达式？ itertools.repeat可以帮助解决这个问题吗？

Answer 1

提高绩效的几种方法：

使用列表理解（或生成器）
不要在每次迭代中计算相同的内容

代码将减少为：

def correct(word):
    candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])

    # Compute soundex outside the loop
    soundex_word = soundex(word)

    # List compre
    candidate_new = [candidate for candidate in candidates if soundex(candidate) == soundex_word]

    # Or Generator. This will save memory
    candidate_new = (candidate for candidate in candidates if soundex(candidate) == soundex_word)

    return max(candidate_new, key=(NWORDS.get))

另一项增强是基于您只需要MAX候选

def correct(word):
    candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])

    soundex_word = soundex(word)
    max_candidate = None
    max_nword = 0
    for candidate in candidates:
        if soundex(candidate) == soundex_word and NWORDS.get(candidate) > max_nword:
            max_candidate = candidate
    return max_candidate

加速执行，Python

1 个答案: