加速执行,Python

时间:2014-02-14 13:10:12

标签: python performance for-loop spell-checking

在执行时,

for循环非常昂贵。我正在构建一个校正算法,我使用了彼得诺维格的拼写纠正码。我对它进行了一些修改,并意识到在数千个单词上执行优化需要很长时间。

算法检查1和2编辑距离并进行纠正。我做到了3。这样可能会增加时间(我不确定)。以下是最高出现的单词作为参考的结尾的一部分:

def correct(word):
    candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word]) # this is where the problem is

    candidate_new = []
    for candidate in candidates: #this statement isnt the problem
        if soundex(candidate) == soundex(word):
            candidate_new.append(candidate)
    return max(candidate_new, key=(NWORDS.get))

看起来语句for candidate in candidates正在增加执行时间。您可以轻松查看彼得诺威的代码,点击here 我已经找到了问题所在。它在声明中

candidates = (known([word]).union(known(edits1(word)))
             ).union(known_edits2(word).union(known_edits3(word)) or [word])

其中,

def known_edits3(word):
    return set(e3 for e1 in edits1(word) for e2 in edits1(e1) 
                                      for e3 in edits1(e2) if e3 in NWORDS)  

可以看出edits3内有3个for循环,这使得执行时间增加了3倍。 edits2有2个for循环。所以这就是罪魁祸首。

如何最小化此表达式? itertools.repeat可以帮助解决这个问题吗?

1 个答案:

答案 0 :(得分:2)

提高绩效的几种方法:

  1. 使用列表理解(或生成器)
  2. 不要在每次迭代中计算相同的内容
  3. 代码将减少为:

    def correct(word):
        candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])
    
        # Compute soundex outside the loop
        soundex_word = soundex(word)
    
        # List compre
        candidate_new = [candidate for candidate in candidates if soundex(candidate) == soundex_word]
    
        # Or Generator. This will save memory
        candidate_new = (candidate for candidate in candidates if soundex(candidate) == soundex_word)
    
        return max(candidate_new, key=(NWORDS.get))
    

    另一项增强是基于您只需要MAX候选

    的事实
    def correct(word):
        candidates = (known([word]).union(known(edits1(word)))).union(known_edits2(word).union(known_edits3(word)) or [word])
    
        soundex_word = soundex(word)
        max_candidate = None
        max_nword = 0
        for candidate in candidates:
            if soundex(candidate) == soundex_word and NWORDS.get(candidate) > max_nword:
                max_candidate = candidate
        return max_candidate