我需要将文本分成多个段落,并能够使用每个段落。我怎样才能做到这一点?每2个段落之间至少可以有1个空行。像这样:
Hello world,
this is an example.
Let´s program something.
Creating new program.
谢谢。
答案 0 :(得分:1)
这部充满灵魂的作品
text.split('\n\n')
答案 1 :(得分:1)
这不是一个完全无关紧要的问题,标准库似乎还没有任何现成的解决方案。
您示例中的段至少被 换行了两个换行符,不幸的是,这使text.split("\n\n")
无效。我认为,改为使用正则表达式是一种可行的策略:
import fileinput
import re
NEWLINES_RE = re.compile(r"\n{2,}") # two or more "\n" characters
def split_paragraphs(input_text=""):
no_newlines = input_text.strip("\n") # remove leading and trailing "\n"
split_text = NEWLINES_RE.split(no_newlines) # regex splitting
paragraphs = [p + "\n" for p in split_text if p.strip()]
# p + "\n" ensures that all lines in the paragraph end with a newline
# p.strip() == True if paragraph has other characters than whitespace
return paragraphs
# sample code, to split all script input files into paragraphs
text = "".join(fileinput.input())
for paragraph in split_paragraphs(text):
print(f"<<{paragraph}>>\n")
编辑后添加:
使用状态机方法可能更清洁。这是一个使用生成器函数的相当简单的示例,该函数具有以下优点:一次仅流过输入一行,并且不将输入的完整副本存储在内存中。
import fileinput
def split_paragraph2(input_lines):
paragraph = [] # store current paragraph as a list
for line in input_lines:
if line.strip(): # True if line is non-empty (apart from whitespace)
paragraph.append(line)
elif paragraph: # If we see an empty line, return paragraph (if any)
yield "".join(paragraph)
paragraph = []
if paragraph: # After end of input, return final paragraph (if any)
yield "".join(paragraph)
# sample code, to split all script input files into paragraphs
for paragraph in split_paragraph2(fileinput.input()):
print(f"<<{paragraph}>>\n")
答案 2 :(得分:0)
尝试
result = list(filter(lambda x : x != '', text.split('\n\n')))
答案 3 :(得分:0)
我通常在拆分之前将其剥离,然后过滤掉“”。 ;)
a =\
'''
Hello world,
this is an example.
Let´s program something.
Creating new program.
'''
data = [content for content in a.strip().splitlines() if content]
print(data)
答案 4 :(得分:0)
这对我有用:
text = "".join(text.splitlines())
text.split('something that is almost always used to separate sentences (i.e. a period, question mark, etc.)')
答案 5 :(得分:0)
更容易。我遇到了同样的问题。
只需将双\n\n 条目替换为您在文本中很少看到的术语(此处为 ¾):
a ='''
Hello world,
this is an example.
Let´s program something.
Creating new program.'''
a = a.replace("\n\n" , "¾")
splitted_text = a.split('¾')
print(splitted_text)