我有一个循环,需要大量的单词,将每个单词分成字母并将它们附加到一个大的列表中。
然后我检查出现最多的字母,如果它没有出现在字符串中,我会将它存储在一个有两个空格的列表中:
list[0]
=发生次数最多的字母
list[1]
=发生了多少次
这个循环效率极低。它可以工作,但返回一个值大约需要25-30秒。在此之前它会继续前进并且不会返回任何值。
如何提高我编写的代码的效率?
def choose_letter(words, pattern):
list_of_letters = []
first_letter = [] # first spot is the letter, second is how many times it appears
second_letter =[] # first spot is letter, second how many times it appears
max_appearances = ["letter", 0]
for i in range(len(words)): # splits up every word into letters
list_of_letters.append(list(words[i]))
list_of_letters = sum(list_of_letters, []) # concatenates the lists within the list
first_letter = list_of_letters.count(0)
for j in list_of_letters:
second_letter = list_of_letters.count(j)
if second_letter >= max_appearances[1] and j not in pattern:
max_appearances[0] = j
max_appearances[1] = second_letter
else:
list_of_letters.remove(j)
return max_appearances[0]
答案 0 :(得分:0)
使其更快的一种方法是选择更好的数据结构。以下是使用collections.Counter
:
from collections import Counter
def choose_letter(words, pattern):
pattern = set(pattern)
letters = (letter
for word in words
for letter in word
if letter not in pattern)
letters = Counter(letters)
return letters.most_common(1)[0][0]
mywords = 'a man a plan a canal panama'.split()
vowels = 'aeiou'
assert choose_letter(mywords, vowels) == 'n'
这是一个使用collections.defaultdict
:
from collections import defaultdict
def choose_letter(words, pattern):
pattern = set(pattern)
counts = defaultdict(int)
for word in words:
for letter in word:
if letter not in pattern:
counts[letter] += 1
return max(counts, key=counts.get)
mywords = 'a man a plan a canal panama'.split()
vowels = 'aeiou'
assert choose_letter(mywords, vowels) == 'n'
答案 1 :(得分:0)
你正在做很多循环&操纵你不需要的列表。每次执行count
或not in
时,都会强制程序循环遍历列表/字符串以查找您要查找的内容。从列表中删除所有这些项目也非常昂贵。一个更优雅的解决方案是只循环一次单词/字母列表,然后使用字典计算每个字母的出现次数。从那里,你有一个字符/计数对的字典&您可以从那里获取键/值,对列表进行排序&看看前两个值。
from collections import defaultdict
from itertools import chain
def choose_letter(words, pattern=""):
count_dict = defaultdict(int) # all unknown values default to 0
for c in chain(*words):
count_dict[c] += 1
# you could replace this "not in" with something more efficient
filtered = [(char, count) for (char,count) in count_dict.items() if char not in pattern]
filtered.sort(lambda a,b: -cmp(a[0], b[0]))
print filtered
return filtered[0][0]
如果你不想深入讨论参数拆包,迭代工具&默认情况,你可以说:
count_dict = {}
for word in words:
for char in word:
count_dict[char] = count_dict.get(char, 0) + 1
...如果你不想尝试挖掘参数解包。