我想修改下面的脚本,以便它从脚本生成的随机数量的句子中创建段落。换句话说,在添加换行符之前连接一个随机数(如1-5)句子。
脚本工作正常,但输出是由换行符分隔的短句。我想把一些句子收集成段落。
有关最佳做法的任何想法?感谢。
"""
from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
# GENERATE SENTENCE OUTPUT
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)])
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s%s" % (" ".join(sentence), newword, sentencesep))
sentence = []
sentencecount += 1
else:
sentence.append(newword)
w1, w2 = w2, newword
编辑01:
好的,我拼凑了一个简单的&#34;段包装,&#34;这很好地将句子收集到段落中,但它与句子生成器的输出相混淆 - 我对第一个单词的重复性过高,例如,其他问题。
但前提是声音;我只需要弄清楚为什么句子循环的功能受到段落循环的添加的影响。请告知您是否可以看到问题:
###
# usage: $ python markov_sentences.py < input.txt > output.txt
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
###
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
paragraphsep = "\n\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
# GENERATE PARAGRAPH OUTPUT
maxparagraphs = 10
paragraphs = 0 # reset the outer 'while' loop counter to zero
while paragraphs < maxparagraphs: # start outer loop, until maxparagraphs is reached
w1 = stopword
w2 = stopword
stopsentence = (".", "!", "?",)
sentence = []
sentencecount = 0 # reset the inner 'while' loop counter to zero
maxsentences = random.randrange(1,5) # random sentences per paragraph
while sentencecount < maxsentences: # start inner loop, until maxsentences is reached
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentencecount += 1 # increment the sentence counter
else:
sentence.append(newword)
w1, w2 = w2, newword
print (paragraphsep) # newline space
paragraphs = paragraphs + 1 # increment the paragraph counter
# EOF
编辑02:
根据以下答案将sentence = []
添加到elif
声明中。好的;
elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = [] # I have to be here to make the new sentence start as an empty list!!!
sentencecount += 1 # increment the sentence counter
编辑03:
这是此脚本的最后一次迭代。感谢悲伤帮助整理出来。我希望其他人可以玩得开心,我知道我会的。 ;)
仅供参考:有一个小工件 - 如果您使用此脚本,还有一个额外的段末空间可能需要清理。但是,除此之外,马尔可夫链文本生成的完美实现。
###
# usage: python markov_sentences.py < input.txt > output.txt
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
###
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
# GENERATE SENTENCE OUTPUT
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep = "\n"
count = random.randrange(1,5)
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = []
sentencecount += 1 # increment the sentence counter
count -= 1
if count == 0:
count = random.randrange(1,5)
print (paragraphsep) # newline space
else:
sentence.append(newword)
w1, w2 = w2, newword
# EOF
答案 0 :(得分:3)
您需要复制
sentence = []
回到
elif newword in stopsentence:
子句。
所以
while paragraphs < maxparagraphs: # start outer loop, until maxparagraphs is reached
w1 = stopword
w2 = stopword
stopsentence = (".", "!", "?",)
sentence = []
sentencecount = 0 # reset the inner 'while' loop counter to zero
maxsentences = random.randrange(1,5) # random sentences per paragraph
while sentencecount < maxsentences: # start inner loop, until maxsentences is reached
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = [] # I have to be here to make the new sentence start as an empty list!!!
sentencecount += 1 # increment the sentence counter
else:
sentence.append(newword)
w1, w2 = w2, newword
print (paragraphsep) # newline space
paragraphs = paragraphs + 1 # increment the paragraph counter
修改
这是一个不使用外循环的解决方案。
"""
from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
# GENERATE SENTENCE OUTPUT
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep == "\n\n"
count = random.randrange(1,5)
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)])
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = []
sentencecount += 1
count -= 1
if count == 0:
count = random.randrange(1,5)
print (paragraphsep)
else:
sentence.append(newword)
w1, w2 = w2, newword
答案 1 :(得分:1)
你了解这段代码吗?我打赌你可以找到打印句子的位,并将其改为一起打印几个句子而不返回。您可以在句子位周围添加另一个while循环以获得多个段落。
语法提示:
print 'hello'
print 'there'
hello
there
print 'hello',
print 'there'
hello there
print 'hello',
print
print 'there'
关键是print语句末尾的逗号会阻止行尾的返回,而空白的print语句会打印一个返回。