Question

计划：

这是一个程序，在给出一个句子的起始单词列表（seedBank）和一个单词对（字典对）的字典后，试图创建一个乱码语句，其中包含来自文本文件的信息，其中包含哪些单词

包含'这是一只猫的text.txt文件的示例。他是一只狗。“意味着我们会输入以下内容：

seedBank = ['This', 'He']

pairs = { 'This':['is'],'is':['a','a'],'a':['cat','dog'],'He':['is'] }

因此，该函数使用这些输入来创建随机生成的句子，因为它遵循半语法正确的格式，因此模糊不清。

def gibberish_sentence(seedBank, pairs):
    gibSentence = []
    gibSentence.append(random.choice(seedBank)) #random seed
    x = gibSentence[0]
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        gibSentence.append(y) #random value is added to main string
        x = y #key x is reset to y
    return ' '.join(gibSentence) #String

问题：

这个程序可以很好地传递像上面那样的小句子，并使用一个定义的random.seed（value），但是当给定一组非常大的输入（seedBank和pair）时，它会失败并返回一个内存错误。因此，我的问题是，该程序的哪些问题可能会导致处理更大的参数时遇到问题？

请注意，这些参数实际上并不是很大，我没有文本文档，但它不会太大，例如没有足够的RAM。

错误代码：

enter image description here

非常感谢你。

决议：谢谢！这个问题已经解决了，实际上是导致问题的while条件，这就是它遍历整个文本而不是仅仅在它到达一个带有fullstop或问号等的单词时结束。本质上这导致它超载记忆，但感谢大家的帮助！

Answer 1

如果没有你的实际pairs，很难说，但如果所有单词在某个时刻相互引用，则有可能出现无限循环：

pairs = { 'someone':['thinks'],'thinks':['that','how'],'that':['someone','anyone'],'how':['someone'], 'anyone': ['thinks'] }

永远不会完成。

Answer 2

如Tim Pietzcker所述，如果pairs中有一个循环，您的代码可以永久循环。以下是最基本的例子：

>>> seedBank = ['and']
>>> pairs = {'and': ['on'], 'on': ['and']}
>>> gibberish_sentence(seedBank, pairs)  # just keeps going

您可以通过修改pairs dict来确保生成的句子（最终）结束，以便当单词出现在句子中的最后一个单词时，它包含一个标记值。例如，对于像'你和我和狗'这样的源文本。：

seedBank = ['You']

pairs = {
    'You': ['and'],
    'and': ['me', 'the'],
    'me': ['and'],
    'the': ['dog'],
    'dog': ['.'],
}

...并在gibberish_sentence()中添加对哨兵的检查：

def gibberish_sentence(seedBank, pairs):
    gibSentence = []
    gibSentence.append(random.choice(seedBank)) #random seed
    x = gibSentence[0]
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        if y == '.':
            break
        gibSentence.append(y) #random value is added to main string
        x = y #key x is reset to y
    return ' '.join(gibSentence) #String

...这使句子有机会终止：

>>> gibberish_sentence(seedBank, pairs)
'You and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and me and me and me and me and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and the dog'

Answer 3

加入一个字符串列表并不是最差的，但就空间效率来说并不是最好的。

考虑使用类似StringIO的内容（当然未经测试）：

from cStringIO import StringIO
import random

def gibberish_sentence(seedBank, pairs):
    seed = random.choice(seedBank)
    gibSentence = StringIO()
    gibSentence.write(seed)             #random seed
    gibSentence.write(' ')
    x = seed
    while(pairs.get(x) is not None):    #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        gibSentence.write(y)            #random value is added to main string
        gibSentence.write(' ')
        x = y                           #key x is reset to y
    return gibSentence.getvalue() #String

Here's a comparison不同的字符串连接方法，就每秒操作数和内存消耗而言。

Answer 4

使用内存效率非常高的生成器可以避免构建列表。

def gibberish_sentence(seedBank, pairs):
    x = random.choice(seedBank)) #random seed
    yield x
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        yield y
        x = y #key x is reset to y

print ' '.join(gibberish_sentence(seedBank, pairs)) #String

或者必须在函数内构建字符串，可以这样做，

def gibberish_sentence(seedBank, pairs):
    def words():
        x = random.choice(seedBank)) #random seed
        yield x
        while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        yield y
        x = y #key x is reset to y
    return ' '.join(words()) #String

函数调用大参数时的内存错误

4 个答案: