对字谜搜索的改进

时间:2012-10-19 09:08:41

标签: python python-3.x

我的问题是,是否有可能改进这些代码,以便我的内容 可以通过整个word_list.txt文件更快地搜索定义的单词列表。我被告知有一种方法可以通过将文件放入适当的数据结构中,对所有14个字重复一次。

word_list = ['serve','rival','lovely','caveat','devote',\
         'irving','livery','selves','latvian','saviour',\
         'observe','octavian','dovetail','Levantine']

def sorted_word(word):
    """This return the sorted word"""
    list_chars = list(word)
    list_chars.sort()
    word_sort = ''.join(list_chars)
    return word_sort

print("Please wait for a few moment...")
print()

#Create a empty dictionary to store our word and the anagrams
dictionary = {}
for words in word_list:
    value = [] #Create an empty list for values for the key
    individual_word_string = words.lower()

    for word in open ('word_list.txt'):
        word1 = word.strip().lower() #Use for comparing

        #When sorted words are the same, update the dictionary        
        if sorted_word(individual_word_string) == sorted_word(word1):
            if word1[0] == 'v':
                value.append(word.strip()) #Print original word in word_list
                tempDict = {individual_word_string:value}
                dictionary.update(tempDict)

#Print dictionary
for key,value in dictionary.items():
    print("{:<10} = {:<}".format(key,value))

由于新的用户限制,我无法发布结果的图像。顺便说一句,结果应该打印出以每个单词的v开头的字谜。很高兴为改进此代码提供任何帮助。

1 个答案:

答案 0 :(得分:0)

如果你有足够的内存,你可以尝试将值存储到字典中,然后对它进行哈希搜索(非常快)。关于这一点的好处是你可以将它腌制以便将来再次使用它(dict创建的过程很慢,查找速度很快)。 如果你有非常大的数据集,你可能想使用map reduce,disco-project是一个很好的python / erlang框架,我建议。

word_list = ['serve','rival','lovely','caveat','devote',\
         'irving','livery','selves','latvian','saviour',\
         'observe','octavian','dovetail','Levantine']

print("Please wait for a few moment...")
print()

anagrams = {}

for word in open ('word_list.txt'):
    word = word.strip().lower() #Use for comparing
    key = tuple(sorted(word))
    anagrams[key] = anagrams.get(key,[]) + [word]

for word in word_list:
    print "%s -> %s" % (word.lower(),aragrams[tuple(sorted(word.lower()))])