我正在尝试在Python中创建一个函数,该函数将生成给定单词的字谜。我不仅在寻找可以无目的地重新排列字母的代码。给出的所有选项必须是真实的单词。目前,我有一个解决方案,说实话,我从YouTube视频中获取了大部分代码,但是对于我而言,它很慢,只能对单个单词提供一个单词的响应。它使用一个40万个单词的词典来比较正在搜索的单词,称为“ dict.txt”。
我的目标是获得以下代码,以模仿此网站的代码的工作情况: https://wordsmith.org/anagram/
使用Google Chrome浏览器的开发人员工具查看网络活动时,我找不到javascript代码,因此我认为该代码可能在后台,并且可能在使用Node.js。这也许会使它比Python更快,但是鉴于它要快得多,我相信它不仅限于编程语言。我认为他们使用的是某种类型的搜索算法,而不是像我一样逐行浏览。我也喜欢这样一个事实,他们的回答不仅限于一个单词,而是可以分解给用户提供更多选项的单词。例如,“ anagram”的字谜是“ nag ram”。
任何建议或想法都会受到赞赏。
谢谢。
def init_words(filename):
words = {}
with open(filename) as f:
for line in f:
word = line.strip()
words[word] = 1
return words
def init_anagram_dict(words):
anagram_dict = {}
for word in words:
sorted_word = ''.join(sorted(list(word)))
if sorted_word not in anagram_dict:
anagram_dict[sorted_word] = []
anagram_dict[sorted_word].append(word)
return anagram_dict
def find_anagrams(word, anagram_dict):
key = ''.join(sorted(list(word)))
if key in anagram_dict:
return set(anagram_dict[key]).difference(set([word]))
return set([])
#This is the first function called.
def make_anagram(user_word):
x = str(user_word)
lower_user_word = str.lower(x)
word_dict = init_words('dict.txt')
result = find_anagrams(lower_user_word, init_anagram_dict(word_dict.keys()))
list_result = list(result)
count = len(list_result)
if count > 0:
random_num = random.randint(0,count -1)
anagram_value = list_result[random_num]
return ('An anagram of %s is %s. Would you like me to search for another word?' %(lower_user_word, anagram_value))
else:
return ("Sorry, I could not find an anagram for %s." %(lower_user_word))
答案 0 :(得分:1)
您可以通过将单词按其排序的文本分组来构建字谜词典。具有相同排序文本的所有单词都是彼此的字谜:
from collections import defaultdict
with open("/usr/share/dict/words","r") as wordFile:
words = wordFile.read().split("\n")
anagrams = defaultdict(list)
for word in words:
anagrams["".join(sorted(word))].append(word)
aWord = "spear"
result = anagrams["".join(sorted(aWord))]
print(aWord,result)
# ['asper', 'parse', 'prase', 'spaer', 'spare', 'spear']
使用235,000个单词,响应时间是瞬时的
为了获得多个单词组成指定单词的字谜,您需要进入组合语。递归函数可能是最简单的方法:
from itertools import combinations,product
from collections import Counter,defaultdict
with open("/usr/share/dict/words","r") as wordFile:
words = wordFile.read().split("\n")
anagrams = defaultdict(set)
for word in words:
anagrams["".join(sorted(word))].add(word)
counters = { w:Counter(w) for w in anagrams }
minLen = 2 # minimum word length
def multigram(word,memo=dict()):
sWord = "".join(sorted(word))
if sWord in memo: return memo[sWord]
result = anagrams[sWord]
wordCounts = counters.get(sWord,Counter())
for size in range(minLen,len(word)-minLen+1):
seen = set()
for combo in combinations(word,size):
left = "".join(sorted(combo))
if left in seen or seen.add(left): continue
left = multigram(left,memo)
if not left: continue
right = multigram("".join((wordCounts-Counter(combo)).elements()),memo)
if not right: continue
result.update(a+" "+b for a,b in product(left,right) )
memo[sWord] = list(result)
return memo[sWord]
最多可以包含12个字符的单词。不仅如此,组合的指数性质开始造成重大损失
result = multigram("spear")
print(result)
# ['parse', 'asper', 'spear', 'er spa', 're spa', 'se rap', 'er sap', 'sa per', 're asp', 'ar pes', 'se par', 'pa ers', 're sap', 'er asp', 'as per', 'spare', 'spaer', 'as rep', 'sa rep', 'ra pes', 'pa ser', 'es rap', 'es par', 'prase']
len(multigram("mulberries")) # 15986 0.1 second 10 letters
len(multigram("raspberries")) # 60613 0.2 second 11 letters
len(multigram("strawberries")) # 374717 1.3 seconds 12 letters
len(multigram("tranquillizer")) # 711491 7.6 seconds 13 letters
len(multigram("communications")) # 10907666 52.2 seconds 14 letters
为了避免任何延迟,可以将函数转换为迭代器。这将使您能够获得前几个字谜,而不必全部生成它们:
def iMultigram(word,prefix=""):
sWord = "".join(sorted(word))
seen = set()
for anagram in anagrams.get(sWord,[]):
full = prefix+anagram
if full in seen or seen.add(full): continue
yield full
wordCounts = counters.get(sWord,Counter(word))
for size in reversed(range(minLen,len(word)-minLen+1)): # longest first
for combo in combinations(sWord,size):
left = "".join(sorted(combo))
if left in seen or seen.add(left): continue
for left in iMultigram(left,prefix):
right = "".join((wordCounts-Counter(combo)).elements())
for full in iMultigram(right,left+" "):
if full in seen or seen.add(full): continue
yield full
from itertools import islice
list(islice(iMultigram("communications"),5)) # 0.0 second
# ['communications', 'cinnamomic so ut', 'cinnamomic so tu', 'cinnamomic os ut', 'cinnamomic os tu']