本书中的exercise 9.3要求读者找到5个禁用字母的组合,这些字母排除了this file中最少的单词数。
以下是我对第一部分的解决方案,我认为对他们没有问题
# if the word contain any letter in letters, return True,
# otherwise return False
def contain(word, letters):
for letter in letters:
if letter in word:
return True
return False
# return the number of words contain any letter in letters
def ncont(words, letters):
count = 0
for word in words:
if contain(word, letters):
count += 1
return count
但对于上面的问题,我只能想到一个强力算法,就是尝试各种可能的组合,确切地说有26个! / 5! = 65780种组合,下面是实施:
def get_lset(nlt, alphabet, cur_set):
global min_n, min_set
# when get enough letters
if nlt <= 0:
cur_n = ncont(words, ''.join(cur_set))
if min_n == -1 or cur_n < min_n:
min_n = cur_n
min_set = cur_set.copy()
print(''.join(cur_set), cur_n, ' *->', min_n, ''.join(min_set))
# otherwise find the result letters in a recursive way
else:
cur_set.append(None)
for i in range(len(alphabet)):
cur_set[-1] = alphabet[i]
get_lset(nlt-1, alphabet[i+1:], cur_set)
cur_set.pop()
然后像这样调用上面的函数:
if __name__ == '__main__':
min_n = -1
min_set = []
with open('words.txt', 'r') as fin:
words = [line.strip() for line in fin]
get_lset(5, list(string.ascii_lowercase), [])
print(min_set, min_n)
但这个解决方案非常慢,我想知道这个问题有更好的算法吗?任何建议都会很好!
答案 0 :(得分:3)
首先,让我们更简洁地重写它
def contain(word, letters):
return any(letter in word for letter in letters)
def ncont(words, letters):
return sum(contain(word, letters) for word in words):
目前您的算法具有平均复杂度
O(len(letters) * len(a_word) * len(words))
---+---------------------- -+--------
contain(word, letters) ncont(words, letters)
我们可以使用set
s:
def contain(word, letters):
return not set(letters).isdisjoint(set(word))
减少到:
O(min(len(letters), len(a_word)) * len(words))
---+-------------------------- -+--------
contain(word, letters) ncont(words, letters)
根据https://wiki.python.org/moin/TimeComplexity
至于第二部分,使用itertools更容易理解算法:
import itertools
def minimum_letter_set(words, n):
attempts = itertools.combinations(string.ascii_lowercase, n)
return min(attempts, key=lambda attempt: ncont(words, attempt))
但是,我们可以做得更好:
def minimum_letter_set(words, n):
# build a lookup table for each letter to the set of words it features in
by_letter = {
letter: {
word
for word in words
if letter in word
}
for letter in string.ascii_lowercase
}
# allowing us to define a function that finds words that match multiple letters
def matching_words(letters):
return set.union(*(by_letter[l] for l in letters))
# find all 5 letter combinations
attempts = itertools.combinations(string.ascii_lowercase, n)
# and return the one that matches the fewest words
return min(attempts, key=lambda a: len(matching_words(a))))
我不相信这会有更低的算法复杂度,但它肯定会省去过滤单词列表的重复工作。
答案 1 :(得分:0)
这是我的想法:
首先计算排除[l],将字母映射到字母l的排除字的集合。
计算这26组中最小的五组的并集。这为您提供了一个公平的临时最低结果&#34;。
然后,不要使用itertools.combinations来探索5个字母的所有组合,而是编写自己的算法来做到这一点。计算&#34;排除&#34;的联盟在里面设置。在这个算法中,如果对于第一个i字母(i&lt; 5),&#34;排除&#34; set已经超过&#34;临时最小结果&#34;,您根本不需要考虑以下字母。如果您发现五个字母组合比当前&#34;临时最小结果&#34;更好,请更新它。
答案 2 :(得分:0)
我的解决方案在这里:
def smallest_set(filename):
avoid_dict = dict.fromkeys(ascii_letters.lower(), 0)
with open(filename) as file_handler:
for line in file_handler:
for key in avoid_dict:
if key not in line:
avoid_dict[key] += 1
avoid_stats_sorted = sorted(avoid_dict, key=avoid_dict.get,
reverse=True)
return ''.join([item for item in avoid_stats_sorted[:5]])