我的问题是,是否有可能改进这些代码,以便我的内容 可以通过整个word_list.txt文件更快地搜索定义的单词列表。我被告知有一种方法可以通过将文件放入适当的数据结构中,对所有14个字重复一次。
word_list = ['serve','rival','lovely','caveat','devote',\
'irving','livery','selves','latvian','saviour',\
'observe','octavian','dovetail','Levantine']
def sorted_word(word):
"""This return the sorted word"""
list_chars = list(word)
list_chars.sort()
word_sort = ''.join(list_chars)
return word_sort
print("Please wait for a few moment...")
print()
#Create a empty dictionary to store our word and the anagrams
dictionary = {}
for words in word_list:
value = [] #Create an empty list for values for the key
individual_word_string = words.lower()
for word in open ('word_list.txt'):
word1 = word.strip().lower() #Use for comparing
#When sorted words are the same, update the dictionary
if sorted_word(individual_word_string) == sorted_word(word1):
if word1[0] == 'v':
value.append(word.strip()) #Print original word in word_list
tempDict = {individual_word_string:value}
dictionary.update(tempDict)
#Print dictionary
for key,value in dictionary.items():
print("{:<10} = {:<}".format(key,value))
由于新的用户限制,我无法发布结果的图像。顺便说一句,结果应该打印出以每个单词的v开头的字谜。很高兴为改进此代码提供任何帮助。
答案 0 :(得分:0)
如果你有足够的内存,你可以尝试将值存储到字典中,然后对它进行哈希搜索(非常快)。关于这一点的好处是你可以将它腌制以便将来再次使用它(dict创建的过程很慢,查找速度很快)。 如果你有非常大的数据集,你可能想使用map reduce,disco-project是一个很好的python / erlang框架,我建议。
word_list = ['serve','rival','lovely','caveat','devote',\
'irving','livery','selves','latvian','saviour',\
'observe','octavian','dovetail','Levantine']
print("Please wait for a few moment...")
print()
anagrams = {}
for word in open ('word_list.txt'):
word = word.strip().lower() #Use for comparing
key = tuple(sorted(word))
anagrams[key] = anagrams.get(key,[]) + [word]
for word in word_list:
print "%s -> %s" % (word.lower(),aragrams[tuple(sorted(word.lower()))])