减少循环的运行时间并提高其效率

时间:2016-12-06 21:38:21

标签: python performance loops processing-efficiency

我有一个循环,需要大量的单词,将每个单词分成字母并将它们附加到一个大的列表中。

然后我检查出现最多的字母,如果它没有出现在字符串中,我会将它存储在一个有两个空格的列表中:

list[0] =发生次数最多的字母

list[1] =发生了多少次

这个循环效率极低。它可以工作,但返回一个值大约需要25-30秒。在此之前它会继续前进并且不会返回任何值。

如何提高我编写的代码的效率?

def choose_letter(words, pattern):
    list_of_letters = []
    first_letter = []  # first spot is the letter, second is how many times it appears
    second_letter =[]  # first spot is letter, second how many times it appears
    max_appearances = ["letter", 0]
    for i in range(len(words)):  # splits up every word into letters
        list_of_letters.append(list(words[i]))
    list_of_letters = sum(list_of_letters, [])   # concatenates the lists within the list
    first_letter = list_of_letters.count(0)
    for j in list_of_letters:
        second_letter = list_of_letters.count(j)
        if second_letter >= max_appearances[1] and j not in pattern:
            max_appearances[0] = j
            max_appearances[1] = second_letter
        else:
            list_of_letters.remove(j)
    return max_appearances[0]

2 个答案:

答案 0 :(得分:0)

使其更快的一种方法是选择更好的数据结构。以下是使用collections.Counter

的示例
from collections import Counter

def choose_letter(words, pattern):
    pattern = set(pattern)
    letters = (letter
               for word in words
               for letter in word
               if letter not in pattern)
    letters = Counter(letters)
    return letters.most_common(1)[0][0]


mywords = 'a man a plan a canal panama'.split()
vowels = 'aeiou'
assert choose_letter(mywords, vowels) == 'n'

这是一个使用collections.defaultdict

的人
from collections import defaultdict

def choose_letter(words, pattern):
    pattern = set(pattern)
    counts = defaultdict(int)
    for word in words:
        for letter in word:
            if letter not in pattern:
                counts[letter] += 1
    return max(counts, key=counts.get)

mywords = 'a man a plan a canal panama'.split()
vowels = 'aeiou'
assert choose_letter(mywords, vowels) == 'n'

答案 1 :(得分:0)

你正在做很多循环&操纵你不需要的列表。每次执行countnot in时,都会强制程序循环遍历列表/字符串以查找您要查找的内容。从列表中删除所有这些项目也非常昂贵。一个更优雅的解决方案是只循环一次单词/字母列表,然后使用字典计算每个字母的出现次数。从那里,你有一个字符/计数对的字典&您可以从那里获取键/值,对列表进行排序&看看前两个值。

from collections import defaultdict
from itertools import chain

def choose_letter(words, pattern=""):
    count_dict = defaultdict(int) # all unknown values default to 0
    for c in chain(*words):
        count_dict[c] += 1
    # you could replace this "not in" with something more efficient
    filtered = [(char, count) for (char,count) in count_dict.items() if char not in pattern] 
    filtered.sort(lambda a,b: -cmp(a[0], b[0]))
    print filtered
    return filtered[0][0]

如果你不想深入讨论参数拆包,迭代工具&默认情况,你可以说:

count_dict = {}
for word in words:
    for char in word:
        count_dict[char] = count_dict.get(char, 0) + 1

...如果你不想尝试挖掘参数解包。