我正在编写一个程序来查找python中字典中混乱单词的所有可能组合。
这是我写的。它在O(n ^ 2)时间内。所以,我的问题是可以更快吗?
import sys
dictfile = "dictionary.txt"
def get_words(text):
""" Return a list of dict words """
return text.split()
def get_possible_words(words,jword):
""" Return a list of possible solutions """
possible_words = []
jword_length = len(jword)
for word in words:
jumbled_word = jword
if len(word) == jword_length:
letters = list(word)
for letter in letters:
if jumbled_word.find(letter) != -1:
jumbled_word = jumbled_word.replace(letter,'',1)
if not jumbled_word:
possible_words.append(word)
return possible_words
if __name__ == '__main__':
words = get_words(file(dictfile).read())
if len(sys.argv) != 2:
print "Incorrect Format. Type like"
print "python %s <jumbled word>" % sys.argv[0]
sys.exit()
jumbled_word = sys.argv[1]
words = get_possible_words(words,jumbled_word)
print "possible words :"
print '\n'.join(words)
答案 0 :(得分:1)
通常快速解决anagram问题,以建立已排序字母到未排序单词列表的映射。
使用该结构,查找立即且快速:
def build_table(wordlist):
table = {}
for word in wordlist:
key = ''.join(sorted(word))
table.setdefault(key, []).append(word)
return table
def lookup(jumble, table):
key = ''.join(sorted(jumble))
return table.get(key, [])
if __name__ == '__main__':
# Build table
with open('/usr/share/dict/words') as f:
wordlist = f.read().lower().split()
table = build_table(wordlist)
# Solve some jumbles
for jumble in ['tesb', 'amgaarn', 'lehsffu', 'tmirlohag']:
print(lookup(jumble, table))
关于速度的说明:
文本文件格式(首先是字母顺序,然后是匹配的单词):
aestt state taste tates testa
enost seton steno stone
...
使用预处理的anagram文件,使用 subprocess 来grep文件以获得适当的匹配单词行变得很简单。这应该提供非常快的运行时间(因为排序和匹配是预先计算的,因为 grep 是如此之快)。
构建预处理的anagram文件,如下所示:
with open('/usr/share/dict/words') as f:
wordlist = f.read().split()
table = {}
for word in wordlist:
key = ''.join(sorted(word)).lower()
table[key] = table.get(key, '') + ' ' + word
lines = ['%s%s\n' % t for t in table.iteritems()]
with open('anagrams.txt', 'w') as f:
f.writelines(lines)
答案 1 :(得分:0)
我试图用红宝石来解决 -
答案 2 :(得分:0)
更改getwords以返回dict()。使每个键的值为true或1
导入itertools并使用itertools.combinations创建所有可能的anagramatic字符串 来自“jumbled_word”
然后遍历可能的字符串,检查它们是否是dict中的键
如果您想要一个DIY算法解决方案,那么将字典加载到树中可能会“更好”,但我怀疑在现实世界中它会更快