Question

我的问题是，是否有可能改进这些代码，以便我的内容可以通过整个word_list.txt文件更快地搜索定义的单词列表。我被告知有一种方法可以通过将文件放入适当的数据结构中，对所有14个字重复一次。

word_list = ['serve','rival','lovely','caveat','devote',\
         'irving','livery','selves','latvian','saviour',\
         'observe','octavian','dovetail','Levantine']

def sorted_word(word):
    """This return the sorted word"""
    list_chars = list(word)
    list_chars.sort()
    word_sort = ''.join(list_chars)
    return word_sort

print("Please wait for a few moment...")
print()

#Create a empty dictionary to store our word and the anagrams
dictionary = {}
for words in word_list:
    value = [] #Create an empty list for values for the key
    individual_word_string = words.lower()

    for word in open ('word_list.txt'):
        word1 = word.strip().lower() #Use for comparing

        #When sorted words are the same, update the dictionary        
        if sorted_word(individual_word_string) == sorted_word(word1):
            if word1[0] == 'v':
                value.append(word.strip()) #Print original word in word_list
                tempDict = {individual_word_string:value}
                dictionary.update(tempDict)

#Print dictionary
for key,value in dictionary.items():
    print("{:<10} = {:<}".format(key,value))

由于新的用户限制，我无法发布结果的图像。顺便说一句，结果应该打印出以每个单词的v开头的字谜。很高兴为改进此代码提供任何帮助。

Answer 1

如果你有足够的内存，你可以尝试将值存储到字典中，然后对它进行哈希搜索（非常快）。关于这一点的好处是你可以将它腌制以便将来再次使用它（dict创建的过程很慢，查找速度很快）。如果你有非常大的数据集，你可能想使用map reduce，disco-project是一个很好的python / erlang框架，我建议。

word_list = ['serve','rival','lovely','caveat','devote',\
         'irving','livery','selves','latvian','saviour',\
         'observe','octavian','dovetail','Levantine']

print("Please wait for a few moment...")
print()

anagrams = {}

for word in open ('word_list.txt'):
    word = word.strip().lower() #Use for comparing
    key = tuple(sorted(word))
    anagrams[key] = anagrams.get(key,[]) + [word]

for word in word_list:
    print "%s -> %s" % (word.lower(),aragrams[tuple(sorted(word.lower()))])

对字谜搜索的改进

1 个答案: