以下函数返回列表中包含与输入单词完全相同的字符的单词数。单词中字符的顺序并不重要。但是,有一个包含数百万字的列表。执行此搜索的最有效和最快的方法是什么?
示例:
words_list = ['yek','lion','eky','ekky','kkey','opt'];
如果我们将单词“key”与列表中的单词匹配,则该函数仅返回“yek”和“eky”,因为它们与“key”共享相同的完全字符,而不管顺序如何。
以下是我写的功能
def find_a4(words_list, word):
# all possible permutations of the word that we are looking for
# it's a set of words
word_permutations = set([''.join(p) for p in permutations(word)])
word_size = len(word)
count = 0
for word in word_list:
# in the case of word "key",
# we only accept words that have 3 characters
# and they are in the word_permutations
if len(word) == word_size and word in word_permutations:
count += 1
return count
答案 0 :(得分:4)
一个字典,其键是单词的排序版本:
word_list = ['yek','lion','eky','ekky','kkey','opt']
from collections import defaultdict
word_index = defaultdict(set)
for word in word_list:
idx = tuple(sorted(word))
word_index[idx].add(word)
# word_index = {
# ('e', 'k', 'y'): {'yek', 'eky'},
# ('i', 'l', 'n', 'o'): {'lion'},
# ('e', 'k', 'k', 'y'): {'kkey', 'ekky'},
# ('o', 'p', 't'): {'opt'}
# }
然后查询你会这样做:
def find_a4(word_index, word):
idx = tuple(sorted(word))
return len(word_index[idx])
或者,如果您需要返回实际的字词,请将其更改为return word_index[idx]
。
效率:查询运行in average in O(1) time。
答案 1 :(得分:2)
对于大字符串,您将有n!
个排列进行搜索。我将在比较之前对所有字符串进行排序,这将是nlog(n),并且仅在长度匹配时才进行排序和比较 -
def find_a4(words_list, word):
word = ''.join(sorted(word))
word_size = len(word)
count = 0
for word1 in words_list:
if len(word1) == word_size:
if word == ''.join(sorted(word1)):
count += 1
return count