python的新手,需要我的程序帮助。我有一个代码,它接收一个未格式化的文本文档,进行一些格式化(设置页面宽度和边距),并输出一个新的文本文档。我的整个代码工作正常,除了这个产生最终输出的函数。
以下是问题代码的一部分:
def process(document, pagewidth, margins, formats):
res = []
onlypw = []
pwmarg = []
count = 0
marg = 0
for segment in margins:
for i in range(count, segment[0]):
res.append(document[i])
text = ''
foundmargin = -1
for i in range(segment[0], segment[1]+1):
marg = segment[2]
text = text + '\n' + document[i].strip(' ')
words = text.split()
注意:segment [0]表示文档的开头,如果您想知道范围,则segment [1]只表示文档的末尾。我的问题是当我将文本复制到单词时(在words = text.split()中)它不会保留我的空白行。我应该得到的输出是:
This is my substitute for pistol and ball. With a
philosophical flourish Cato throws himself upon his sword; I
quietly take to the ship. There is nothing surprising in
this. If they but knew it, almost all men in their degree,
some time or other, cherish very nearly the same feelings
towards the ocean with me.
There now is your insular city of the Manhattoes, belted
round by wharves as Indian isles by coral reefs--commerce
surrounds it with her surf.
我当前的输出是什么样的:
This is my substitute for pistol and ball. With a
philosophical flourish Cato throws himself upon his sword; I
quietly take to the ship. There is nothing surprising in
this. If they but knew it, almost all men in their degree,
some time or other, cherish very nearly the same feelings
towards the ocean with me. There now is your insular city of
the Manhattoes, belted round by wharves as Indian isles by
coral reefs--commerce surrounds it with her surf.
我知道当我将文字复制到文字时会出现问题,因为它没有保留空白行。如何确保复制空白行和单词? 如果我要添加更多代码或更多细节,请告诉我们!
答案 0 :(得分:4)
首先拆分至少2个换行符,然后拆分为单词:
import re
paragraphs = re.split('\n\n+', text)
words = [paragraph.split() for paragraph in paragraphs]
您现在有一个列表列表,每个段落一个;处理这些每段,之后你可以将所有内容重新加入到新文本中,并插入双重换行符。
我使用re.split()
来支持由超过2个换行符分隔的段落;你可以使用一个简单的text.split('\n\n')
,如果段落之间只有两条新线。
答案 1 :(得分:1)
使用正则表达式查找单词和空行而不是分割
m = re.compile('(\S+|\n\n)')
words=m.findall(text)