我正在编写一个包含9个字符的程序,创建所有可能的排列,并为每个字符抓取字典文件,然后创建一组所有可能的单词。我需要做的是将所有排列与单词进行比较并返回匹配。
import os, itertools
def parsed(choices):
mySet = set()
location = os.getcwd()
for item in choices:
filename = location + "\\dicts\\%s.txt" % (item)
mySet.update(open(filename).read().splitlines())
return mySet
def permutations(input):
possibilities = []
pospos = []
for x in range(3,9):
pospos.append([''.join(i) for i in itertools.permutations(input, x)])
for pos in pospos:
for i in pos:
possibilities.append(i)
return possibilities
有问题的功能就是这个:
def return_matches():
matches = []
words = parsed(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
pos = permutations(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
for item in pos:
if item in words:
matches.append(item)
return matches
此代码应返回:
matches = ['a', 'om', 'ja', 'jo', ..., 'jacks', 'cokes', 'kecks', 'jokes', 'cakes', 'smoke', 'comes', 'makes', 'cameos']
如果我让这段代码正常工作,则需要10到15分钟才能完成。另一方面,每次尝试在规定的时间内执行此操作时,只能使用5个或更少的字符或返回错误的结果。
所以我的问题是如何在30秒内优化此代码以返回正确的结果。
修改 http://www.mso.anu.edu.au/~ralph/OPTED/v003这是我正在抓取字典文件的网站。
答案 0 :(得分:1)
在测试它们是否有效之前,它会浪费RAM和时间将所有排列存储在列表中。相反,在生成排列时测试排列,并将有效排列保存到集合中以消除重复。
由于itertools.permutations
的工作方式,可能会出现重复:
根据元素的位置而不是元素,将元素视为唯一元素 值。因此,如果输入元素是唯一的,则不会重复 每个排列中的值。
您的输入词" SMOKEJACK"包含2 Ks,因此包含K的每个排列都会生成两次。
无论如何,这里有一些代码使用SOWPODS Scrabble单词列表来表示英语。
This Dispatcher has been shut down.;
nested exception is java.lang.IllegalArgumentException: This Dispatcher has been shut down.
org.springframework.transaction.CannotCreateTransactionException: This Dispatcher has been shut down.; nested exception is java.lang.IllegalArgumentException: This Dispatcher has been shut down.
at org.grails.transaction.ChainedTransactionManager.getTransaction(ChainedTransactionManager.java:127)
at org.grails.transaction.ChainedTransactionManager.getTransaction(ChainedTransactionManager.java:55)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
at grails.transaction.GrailsTransactionTemplate.execute(GrailsTransactionTemplate.groovy:93)
at com.TestControllerIntegrationSpec.setup(TestControllerIntegrationSpec.groovy)
Caused by: java.lang.IllegalArgumentException: This Dispatcher has been shut down.
at reactor.core.support.Assert.isTrue(Assert.java:61)
at reactor.core.dispatch.AbstractLifecycleDispatcher.dispatch(AbstractLifecycleDispatcher.java:111)
at reactor.bus.EventBus.notify(EventBus.java:368)
at grails.events.Events$Trait$Helper.notify(Events.groovy:111)
at org.grails.events.spring.SpringEventTranslator.onApplicationEvent(SpringEventTranslator.groovy:66)
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:166)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:138)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:382)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:336)
at org.grails.datastore.gorm.events.ConfigurableApplicationContextEventPublisher.publishEvent(ConfigurableApplicationContextEventPublisher.groovy:30)
at org.grails.datastore.mapping.core.AbstractDatastore.publishSessionCreationEvent(AbstractDatastore.java:125)
at org.grails.datastore.mapping.core.AbstractDatastore.connect(AbstractDatastore.java:118)
at org.grails.datastore.mapping.core.AbstractDatastore.connect(AbstractDatastore.java:113)
at org.grails.datastore.mapping.transactions.DatastoreTransactionManager.doGetTransaction(DatastoreTransactionManager.java:101)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:337)
at org.grails.transaction.MultiTransactionStatus.registerTransactionManager(MultiTransactionStatus.java:68)
at org.grails.transaction.ChainedTransactionManager.getTransaction(ChainedTransactionManager.java:106)
... 4 more
<强>输出强>
from itertools import permutations
# Get all the words from the SOWPODS file
all_words = set('AI')
fname = 'scrabble_wordlist_sowpods.txt'
with open(fname) as f:
all_words.update(f.read().splitlines())
print(len(all_words))
choices = 'SMOKEJACK'
# Generate all permutations of `choices` from length 3 to 8
# and save them in a set to eliminate duplicates.
matches = set()
for n in range(3, 9):
for t in permutations(choices, n):
s = ''.join(t)
if s in all_words:
matches.add(s)
for i, s in enumerate(sorted(matches)):
print('{:3} {}'.format(i, s))
这个代码在我在Linux上运行Python 3.6.0的相当古老的32位2GHz机器上运行大约2.5秒。它在Python 2上稍快一些(因为Python2字符串是ASCII,而不是Unicode)。
答案 1 :(得分:1)
您应该使用Prefix Tree, or Trie来跟踪所有前缀到有效字词,而不是生成所有字母的排列。
def make_trie(words):
res = {}
for word in words:
d = res
for c in word:
d = d.setdefault(c, {})
d["."] = None
return res
我们在这里使用d["."] = None
表示前缀实际上成为有效单词的位置。创建树可能需要几秒钟,但您只需要执行一次。
现在,我们可以在递归函数中查看我们的字母,检查每个字母是否有助于递归当前阶段的有效前缀:( rest = letters[:i] + letters[i+1:]
部分效率不高,但是我们会发现它并不重要。)
def find_words(trie, letters, prefix=""):
if "." in trie: # found a full valid word
yield prefix
for i, c in enumerate(letters):
if c in trie: # contributes to valid prefix
rest = letters[:i] + letters[i+1:]
for res in find_words(trie[c], rest, prefix + c):
yield res # all words starting with that prefix
最小例子:
>>> trie = make_trie(["cat", "cats", "act", "car", "carts", "cash"])
>>> trie
{'a': {'c': {'t': {'.': None}}}, 'c': {'a': {'r': {'t': {'s':
{'.': None}}, '.': None}, 's': {'h': {'.': None}}, 't':
{'s': {'.': None}, '.': None}}}}
>>> set(find_words(trie, "acst"))
{'cat', 'act', 'cats'}
或使用您的9个字母和sowpods.txt
中的字词:
with open("sowpods.txt") as words:
trie = make_trie(map(str.strip, words)) # ~1.3 s on my system, only once
res = set(find_words(trie, "SMOKEJACK")) # ~2 ms on my system
由于您有重复的字母,您必须通过set
管道结果。在对find_words
的总共623次递归调用(用计数器变量测量)之后,这产生153个单词。将其与sowpods.txt
文件中的216,555个单词进行比较,并将所有1-9个字母组合的总共986,409个排列组成一个有效单词。因此,一旦最初生成trie
,res = set(find_words(...))
只需几毫秒。
您还可以更改find_words
函数以使用字母计数的可变字典,而不是字符串或字母列表。这样,不会生成重复项,并且调用函数的次数会减少,但总体运行时间不会发生太大变化。
def find_words(trie, letters, prefix=""):
if "." in trie:
yield prefix
for c in letters:
if letters[c] and c in trie:
letters[c] -= 1
for res in find_words(trie[c], letters, prefix + c):
yield res
letters[c] += 1
然后将其称为:find_words(trie, collections.Counter("SMOKEJACK"))