在规定的时间内查找排列的所有匹配项

时间:2017-02-13 13:32:32

标签: python performance python-3.x permutation

我正在编写一个包含9个字符的程序,创建所有可能的排列,并为每个字符抓取字典文件,然后创建一组所有可能的单词。我需要做的是将所有排列与单词进行比较并返回匹配。

import os, itertools

def parsed(choices): 
    mySet = set()
    location = os.getcwd()
    for item in choices: 
        filename = location + "\\dicts\\%s.txt" % (item)
        mySet.update(open(filename).read().splitlines())

    return mySet  

def permutations(input): 
    possibilities = []
    pospos = []   

    for x in range(3,9):
        pospos.append([''.join(i) for i in itertools.permutations(input, x)])


    for pos in pospos: 
        for i in pos: 
            possibilities.append(i)
    return possibilities

有问题的功能就是这个:

def return_matches(): 
    matches = []
    words = parsed(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
    pos = permutations(['s','m','o','k','e', 'j', 'a', 'c', 'k'])

    for item in pos:  
        if item in words: 
            matches.append(item)

    return matches

此代码应返回:

matches = ['a', 'om', 'ja', 'jo', ..., 'jacks', 'cokes', 'kecks', 'jokes', 'cakes', 'smoke', 'comes', 'makes', 'cameos']

如果我让这段代码正常工作,则需要10到15分钟才能完成。另一方面,每次尝试在规定的时间内执行此操作时,只能使用5个或更少的字符或返回错误的结果。

所以我的问题是如何在30秒内优化此代码以返回正确的结果。

修改 http://www.mso.anu.edu.au/~ralph/OPTED/v003这是我正在抓取字典文件的网站。

2 个答案:

答案 0 :(得分:1)

在测试它们是否有效之前,它会浪费RAM和时间将所有排列存储在列表中。相反,在生成排列时测试排列,并将有效排列保存到集合中以消除重复。

由于itertools.permutations的工作方式,可能会出现重复:

  

根据元素的位置而不是元素,将元素视为唯一元素   值。因此,如果输入元素是唯一的,则不会重复   每个排列中的值。

您的输入词" SMOKEJACK"包含2 Ks,因此包含K的每个排列都会生成两次。

无论如何,这里有一些代码使用SOWPODS Scrabble单词列表来表示英语。

This Dispatcher has been shut down.; 
nested exception is java.lang.IllegalArgumentException: This Dispatcher has been shut down.
org.springframework.transaction.CannotCreateTransactionException: This Dispatcher has been shut down.; nested exception is java.lang.IllegalArgumentException: This Dispatcher has been shut down.
    at org.grails.transaction.ChainedTransactionManager.getTransaction(ChainedTransactionManager.java:127)
    at org.grails.transaction.ChainedTransactionManager.getTransaction(ChainedTransactionManager.java:55)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
    at grails.transaction.GrailsTransactionTemplate.execute(GrailsTransactionTemplate.groovy:93)
    at com.TestControllerIntegrationSpec.setup(TestControllerIntegrationSpec.groovy)
Caused by: java.lang.IllegalArgumentException: This Dispatcher has been shut down.
    at reactor.core.support.Assert.isTrue(Assert.java:61)
    at reactor.core.dispatch.AbstractLifecycleDispatcher.dispatch(AbstractLifecycleDispatcher.java:111)
    at reactor.bus.EventBus.notify(EventBus.java:368)
    at grails.events.Events$Trait$Helper.notify(Events.groovy:111)
    at org.grails.events.spring.SpringEventTranslator.onApplicationEvent(SpringEventTranslator.groovy:66)
    at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:166)
    at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:138)
    at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:382)
    at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:336)
    at org.grails.datastore.gorm.events.ConfigurableApplicationContextEventPublisher.publishEvent(ConfigurableApplicationContextEventPublisher.groovy:30)
    at org.grails.datastore.mapping.core.AbstractDatastore.publishSessionCreationEvent(AbstractDatastore.java:125)
    at org.grails.datastore.mapping.core.AbstractDatastore.connect(AbstractDatastore.java:118)
    at org.grails.datastore.mapping.core.AbstractDatastore.connect(AbstractDatastore.java:113)
    at org.grails.datastore.mapping.transactions.DatastoreTransactionManager.doGetTransaction(DatastoreTransactionManager.java:101)
    at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:337)
    at org.grails.transaction.MultiTransactionStatus.registerTransactionManager(MultiTransactionStatus.java:68)
    at org.grails.transaction.ChainedTransactionManager.getTransaction(ChainedTransactionManager.java:106)
    ... 4 more

<强>输出

from itertools import permutations

# Get all the words from the SOWPODS file
all_words = set('AI')
fname = 'scrabble_wordlist_sowpods.txt'
with open(fname) as f:
    all_words.update(f.read().splitlines())

print(len(all_words))

choices = 'SMOKEJACK'

# Generate all permutations of `choices` from length 3 to 8 
# and save them in a set to eliminate duplicates.
matches = set()
for n in range(3, 9):
    for t in permutations(choices, n):
        s = ''.join(t)
        if s in all_words:
            matches.add(s)

for i, s in enumerate(sorted(matches)):
    print('{:3} {}'.format(i, s))

这个代码在我在Linux上运行Python 3.6.0的相当古老的32位2GHz机器上运行大约2.5秒。它在Python 2上稍快一些(因为Python2字符串是ASCII,而不是Unicode)。

答案 1 :(得分:1)

您应该使用Prefix Tree, or Trie来跟踪所有前缀到有效字词,而不是生成所有字母的排列。

def make_trie(words):
    res = {}
    for word in words:
        d = res
        for c in word:
            d = d.setdefault(c, {})
        d["."] = None
    return res

我们在这里使用d["."] = None表示前缀实际上成为有效单词的位置。创建树可能需要几秒钟,但您只需要执行一次。

现在,我们可以在递归函数中查看我们的字母,检查每个字母是否有助于递归当前阶段的有效前缀:( rest = letters[:i] + letters[i+1:]部分效率不高,但是我们会发现它并不重要。)

def find_words(trie, letters, prefix=""):
    if "." in trie:  # found a full valid word
        yield prefix
    for i, c in enumerate(letters):
        if c in trie:  # contributes to valid prefix
            rest = letters[:i] + letters[i+1:]
            for res in find_words(trie[c], rest, prefix + c):
                yield res  # all words starting with that prefix

最小例子:

>>> trie = make_trie(["cat", "cats", "act", "car", "carts", "cash"])
>>> trie
{'a': {'c': {'t': {'.': None}}}, 'c': {'a': {'r': {'t': {'s': 
    {'.':  None}}, '.': None}, 's': {'h': {'.': None}}, 't': 
    {'s': {'.': None}, '.': None}}}}
>>> set(find_words(trie, "acst"))
{'cat', 'act', 'cats'}

或使用您的9个字母和sowpods.txt中的字词:

with open("sowpods.txt") as words:
    trie = make_trie(map(str.strip, words))  # ~1.3 s on my system, only once
    res = set(find_words(trie, "SMOKEJACK")) #  ~2 ms on my system

由于您有重复的字母,您必须通过set管道结果。在对find_words的总共623次递归调用(用计数器变量测量)之后,这产生153个单词。将其与sowpods.txt文件中的216,555个单词进行比较,并将所有1-9个字母组合的总共986,409个排列组成一个有效单词。因此,一旦最初生成trieres = set(find_words(...))只需几毫秒。

您还可以更改find_words函数以使用字母计数的可变字典,而不是字符串或字母列表。这样,不会生成重复项,并且调用函数的次数会减少,但总体运行时间不会发生太大变化。

def find_words(trie, letters, prefix=""):
    if "." in trie:
        yield prefix
    for c in letters:
        if letters[c] and c in trie:
            letters[c] -= 1
            for res in find_words(trie[c], letters, prefix + c):
                yield res
            letters[c] += 1

然后将其称为:find_words(trie, collections.Counter("SMOKEJACK"))