计划:
这是一个程序,在给出一个句子的起始单词列表(seedBank)和一个单词对(字典对)的字典后,试图创建一个乱码语句,其中包含来自文本文件的信息,其中包含哪些单词
包含'这是一只猫的text.txt文件的示例。他是一只狗。“意味着我们会输入以下内容:
seedBank = ['This', 'He']
pairs = { 'This':['is'],'is':['a','a'],'a':['cat','dog'],'He':['is'] }
因此,该函数使用这些输入来创建随机生成的句子,因为它遵循半语法正确的格式,因此模糊不清。
def gibberish_sentence(seedBank, pairs):
gibSentence = []
gibSentence.append(random.choice(seedBank)) #random seed
x = gibSentence[0]
while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
y = random.choice(pairs.get(x)) #random value of key x
gibSentence.append(y) #random value is added to main string
x = y #key x is reset to y
return ' '.join(gibSentence) #String
问题:
这个程序可以很好地传递像上面那样的小句子,并使用一个定义的random.seed(value),但是当给定一组非常大的输入(seedBank和pair)时,它会失败并返回一个内存错误。因此,我的问题是,该程序的哪些问题可能会导致处理更大的参数时遇到问题?
请注意,这些参数实际上并不是很大,我没有文本文档,但它不会太大,例如没有足够的RAM。
错误代码:
非常感谢你。
决议:谢谢!这个问题已经解决了,实际上是导致问题的while条件,这就是它遍历整个文本而不是仅仅在它到达一个带有fullstop或问号等的单词时结束。本质上这导致它超载记忆,但感谢大家的帮助!
答案 0 :(得分:3)
如果没有你的实际pairs
,很难说,但如果所有单词在某个时刻相互引用,则有可能出现无限循环:
pairs = { 'someone':['thinks'],'thinks':['that','how'],'that':['someone','anyone'],'how':['someone'], 'anyone': ['thinks'] }
永远不会完成。
答案 1 :(得分:3)
如Tim Pietzcker所述,如果pairs
中有一个循环,您的代码可以永久循环。以下是最基本的例子:
>>> seedBank = ['and']
>>> pairs = {'and': ['on'], 'on': ['and']}
>>> gibberish_sentence(seedBank, pairs) # just keeps going
您可以通过修改pairs
dict来确保生成的句子(最终)结束,以便当单词出现在句子中的最后一个单词时,它包含一个标记值。例如,对于像'你和我和狗'这样的源文本。:
seedBank = ['You']
pairs = {
'You': ['and'],
'and': ['me', 'the'],
'me': ['and'],
'the': ['dog'],
'dog': ['.'],
}
...并在gibberish_sentence()
中添加对哨兵的检查:
def gibberish_sentence(seedBank, pairs):
gibSentence = []
gibSentence.append(random.choice(seedBank)) #random seed
x = gibSentence[0]
while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
y = random.choice(pairs.get(x)) #random value of key x
if y == '.':
break
gibSentence.append(y) #random value is added to main string
x = y #key x is reset to y
return ' '.join(gibSentence) #String
...这使句子有机会终止:
>>> gibberish_sentence(seedBank, pairs)
'You and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and me and me and me and me and the dog'
>>> gibberish_sentence(seedBank, pairs)
'You and me and the dog'
答案 2 :(得分:0)
加入一个字符串列表并不是最差的,但就空间效率来说并不是最好的。
考虑使用类似StringIO
的内容(当然未经测试):
from cStringIO import StringIO
import random
def gibberish_sentence(seedBank, pairs):
seed = random.choice(seedBank)
gibSentence = StringIO()
gibSentence.write(seed) #random seed
gibSentence.write(' ')
x = seed
while(pairs.get(x) is not None): #Loop while value x is a key in the dictionairy
y = random.choice(pairs.get(x)) #random value of key x
gibSentence.write(y) #random value is added to main string
gibSentence.write(' ')
x = y #key x is reset to y
return gibSentence.getvalue() #String
Here's a comparison不同的字符串连接方法,就每秒操作数和内存消耗而言。
答案 3 :(得分:0)
使用内存效率非常高的生成器可以避免构建列表。
def gibberish_sentence(seedBank, pairs):
x = random.choice(seedBank)) #random seed
yield x
while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
y = random.choice(pairs.get(x)) #random value of key x
yield y
x = y #key x is reset to y
print ' '.join(gibberish_sentence(seedBank, pairs)) #String
或者必须在函数内构建字符串,可以这样做,
def gibberish_sentence(seedBank, pairs):
def words():
x = random.choice(seedBank)) #random seed
yield x
while(pairs.get(x)is not None): #Loop while value x is a key in the dictionairy
y = random.choice(pairs.get(x)) #random value of key x
yield y
x = y #key x is reset to y
return ' '.join(words()) #String