Question

我需要将文本分成多个段落，并能够使用每个段落。我怎样才能做到这一点？每2个段落之间至少可以有1个空行。像这样：

Hello world,
  this is an example.

Let´s program something.


Creating  new  program.

谢谢。

Answer 1

这部充满灵魂的作品

text.split('\n\n')

Answer 2

这不是一个完全无关紧要的问题，标准库似乎还没有任何现成的解决方案。

您示例中的

段至少被换行了两个换行符，不幸的是，这使text.split("\n\n")无效。我认为，改为使用正则表达式是一种可行的策略：

import fileinput
import re

NEWLINES_RE = re.compile(r"\n{2,}")  # two or more "\n" characters

def split_paragraphs(input_text=""):
    no_newlines = input_text.strip("\n")  # remove leading and trailing "\n"
    split_text = NEWLINES_RE.split(no_newlines)  # regex splitting

    paragraphs = [p + "\n" for p in split_text if p.strip()]
    # p + "\n" ensures that all lines in the paragraph end with a newline
    # p.strip() == True if paragraph has other characters than whitespace

    return paragraphs

# sample code, to split all script input files into paragraphs
text = "".join(fileinput.input())
for paragraph in split_paragraphs(text):
    print(f"<<{paragraph}>>\n")

编辑后添加：

使用状态机方法可能更清洁。这是一个使用生成器函数的相当简单的示例，该函数具有以下优点：一次仅流过输入一行，并且不将输入的完整副本存储在内存中。

import fileinput

def split_paragraph2(input_lines):
    paragraph = []  # store current paragraph as a list
    for line in input_lines:
        if line.strip():  # True if line is non-empty (apart from whitespace)
            paragraph.append(line)
        elif paragraph:  # If we see an empty line, return paragraph (if any)
            yield "".join(paragraph)
            paragraph = []
    if paragraph:  # After end of input, return final paragraph (if any)
        yield "".join(paragraph)

# sample code, to split all script input files into paragraphs
for paragraph in split_paragraph2(fileinput.input()):
    print(f"<<{paragraph}>>\n")

Answer 3

尝试

result = list(filter(lambda x : x != '', text.split('\n\n')))

Answer 4

我通常在拆分之前将其剥离，然后过滤掉“”。 ;）

a =\
'''
Hello world,
  this is an example.

Let´s program something.


Creating  new  program.


'''

data = [content for content in a.strip().splitlines() if content]

print(data)

Answer 5

这对我有用：

text = "".join(text.splitlines())
text.split('something that is almost always used to separate sentences (i.e. a period, question mark, etc.)')

Answer 6

更容易。我遇到了同样的问题。

只需将双\n\n 条目替换为您在文本中很少看到的术语（此处为 ¾）：

a ='''
Hello world,
  this is an example.

Let´s program something.


Creating  new  program.'''
a = a.replace("\n\n" , "¾")

splitted_text = a.split('¾')

print(splitted_text)

Python-如何将段落与文本分开？

6 个答案: