函数调用大参数时的内存错误

时间:2015-03-19 06:27:19

标签: python memory

计划:

这是一个程序,在给出一个句子的起始单词列表(seedBank)和一个单词对(字典对)的字典后,试图创建一个乱码语句,其中包含来自文本文件的信息,其中包含哪些单词

包含'这是一只猫的text.txt文件的示例。他是一只狗。“意味着我们会输入以下内容:

seedBank = ['This', 'He']

pairs = { 'This':['is'],'is':['a','a'],'a':['cat','dog'],'He':['is'] } 

因此,该函数使用这些输入来创建随机生成的句子,因为它遵循半语法正确的格式,因此模糊不清。

def gibberish_sentence(seedBank, pairs):
    gibSentence = []
    gibSentence.append(random.choice(seedBank)) #random seed
    x = gibSentence[0]
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        gibSentence.append(y) #random value is added to main string
        x = y #key x is reset to y
    return ' '.join(gibSentence) #String

问题:

这个程序可以很好地传递像上面那样的小句子,并使用一个定义的random.seed(value),但是当给定一组非常大的输入(seedBank和pair)时,它会失败并返回一个内存错误。因此,我的问题是,该程序的哪些问题可能会导致处理更大的参数时遇到问题?

请注意,这些参数实际上并不是很大,我没有文本文档,但它不会太大,例如没有足够的RAM。

错误代码:

enter image description here

非常感谢你。

决议:谢谢!这个问题已经解决了,实际上是导致问题的while条件,这就是它遍历整个文本而不是仅仅在它到达一个带有fullstop或问号等的单词时结束。本质上这导致它超载记忆,但感谢大家的帮助!

4 个答案:

答案 0 :(得分:3)

如果没有你的实际pairs,很难说,但如果所有单词在某个时刻相互引用,则有可能出现无限循环:

pairs = { 'someone':['thinks'],'thinks':['that','how'],'that':['someone','anyone'],'how':['someone'], 'anyone': ['thinks'] } 

永远不会完成。

答案 1 :(得分:3)

如Tim Pietzcker所述,如果pairs中有一个循环,您的代码可以永久循环。以下是最基本的例子:

>>> seedBank = ['and']
>>> pairs = {'and': ['on'], 'on': ['and']}
>>> gibberish_sentence(seedBank, pairs)  # just keeps going

您可以通过修改pairs dict来确保生成的句子(最终)结束,以便当单词出现在句子中的最后一个单词时,它包含一个标记值。例如,对于像'你和我和狗'这样的源文本。:

seedBank = ['You']

pairs = {
    'You': ['and'],
    'and': ['me', 'the'],
    'me': ['and'],
    'the': ['dog'],
    'dog': ['.'],
}

...并在gibberish_sentence()中添加对哨兵的检查:

def gibberish_sentence(seedBank, pairs):
    gibSentence = []
    gibSentence.append(random.choice(seedBank)) #random seed
    x = gibSentence[0]
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        if y == '.':
            break
        gibSentence.append(y) #random value is added to main string
        x = y #key x is reset to y
    return ' '.join(gibSentence) #String

...这使句子有机会终止:

>>> gibberish_sentence(seedBank, pairs)
'You and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and me and me and me and me and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and the dog'

答案 2 :(得分:0)

加入一个字符串列表并不是最差的,但就空间效率来说并不是最好的。

考虑使用类似StringIO的内容(当然未经测试):

from cStringIO import StringIO
import random

def gibberish_sentence(seedBank, pairs):
    seed = random.choice(seedBank)
    gibSentence = StringIO()
    gibSentence.write(seed)             #random seed
    gibSentence.write(' ')
    x = seed
    while(pairs.get(x) is not None):    #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        gibSentence.write(y)            #random value is added to main string
        gibSentence.write(' ')
        x = y                           #key x is reset to y
    return gibSentence.getvalue() #String

Here's a comparison不同的字符串连接方法,就每秒操作数和内存消耗而言。

答案 3 :(得分:0)

使用内存效率非常高的生成器可以避免构建列表。

def gibberish_sentence(seedBank, pairs):
    x = random.choice(seedBank)) #random seed
    yield x
    while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        yield y
        x = y #key x is reset to y

print ' '.join(gibberish_sentence(seedBank, pairs)) #String

或者必须在函数内构建字符串,可以这样做,

def gibberish_sentence(seedBank, pairs):
    def words():
        x = random.choice(seedBank)) #random seed
        yield x
        while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
        y = random.choice(pairs.get(x)) #random value of key x
        yield y
        x = y #key x is reset to y
    return ' '.join(words()) #String