Question

我有一个循环，需要大量的单词，将每个单词分成字母并将它们附加到一个大的列表中。

然后我检查出现最多的字母，如果它没有出现在字符串中，我会将它存储在一个有两个空格的列表中：

list[0] =发生次数最多的字母

list[1] =发生了多少次

这个循环效率极低。它可以工作，但返回一个值大约需要25-30秒。在此之前它会继续前进并且不会返回任何值。

如何提高我编写的代码的效率？

def choose_letter(words, pattern):
    list_of_letters = []
    first_letter = []  # first spot is the letter, second is how many times it appears
    second_letter =[]  # first spot is letter, second how many times it appears
    max_appearances = ["letter", 0]
    for i in range(len(words)):  # splits up every word into letters
        list_of_letters.append(list(words[i]))
    list_of_letters = sum(list_of_letters, [])   # concatenates the lists within the list
    first_letter = list_of_letters.count(0)
    for j in list_of_letters:
        second_letter = list_of_letters.count(j)
        if second_letter >= max_appearances[1] and j not in pattern:
            max_appearances[0] = j
            max_appearances[1] = second_letter
        else:
            list_of_letters.remove(j)
    return max_appearances[0]

Answer 1

使其更快的一种方法是选择更好的数据结构。以下是使用collections.Counter：

的示例

from collections import Counter

def choose_letter(words, pattern):
    pattern = set(pattern)
    letters = (letter
               for word in words
               for letter in word
               if letter not in pattern)
    letters = Counter(letters)
    return letters.most_common(1)[0][0]


mywords = 'a man a plan a canal panama'.split()
vowels = 'aeiou'
assert choose_letter(mywords, vowels) == 'n'

这是一个使用collections.defaultdict：

的人

from collections import defaultdict

def choose_letter(words, pattern):
    pattern = set(pattern)
    counts = defaultdict(int)
    for word in words:
        for letter in word:
            if letter not in pattern:
                counts[letter] += 1
    return max(counts, key=counts.get)

mywords = 'a man a plan a canal panama'.split()
vowels = 'aeiou'
assert choose_letter(mywords, vowels) == 'n'

Answer 2

你正在做很多循环＆amp;操纵你不需要的列表。每次执行count或not in时，都会强制程序循环遍历列表/字符串以查找您要查找的内容。从列表中删除所有这些项目也非常昂贵。一个更优雅的解决方案是只循环一次单词/字母列表，然后使用字典计算每个字母的出现次数。从那里，你有一个字符/计数对的字典＆amp;您可以从那里获取键/值，对列表进行排序＆amp;看看前两个值。

from collections import defaultdict
from itertools import chain

def choose_letter(words, pattern=""):
    count_dict = defaultdict(int) # all unknown values default to 0
    for c in chain(*words):
        count_dict[c] += 1
    # you could replace this "not in" with something more efficient
    filtered = [(char, count) for (char,count) in count_dict.items() if char not in pattern] 
    filtered.sort(lambda a,b: -cmp(a[0], b[0]))
    print filtered
    return filtered[0][0]

如果你不想深入讨论参数拆包，迭代工具＆amp;默认情况，你可以说：

count_dict = {}
for word in words:
    for char in word:
        count_dict[char] = count_dict.get(char, 0) + 1

...如果你不想尝试挖掘参数解包。

减少循环的运行时间并提高其效率

2 个答案: