如何在换行符之后迭代字符串?

时间:2019-07-24 15:38:47

标签: python python-3.x python-2.7 list newline

我正在尝试将这些句子相互比较。例如,我想看看BEFORE是否与BEFORE THE相同,但显然不同。但是,问题是我想遍历换行符,所以

BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS仅包含一个字符串。下面是一个示例文件。

BEFORE

BEFORE THE

BEFORE THE PARLIAMENT

BEFORE THE PARLIAMENT ON

BEFORE THE PARLIAMENT ON
BRITAIN'S

BEFORE THE PARLIAMENT ON
BRITAIN'S RELATIONS

BEFORE THE PARLIAMENT ON
BRITAIN'S RELATIONS WITH

我现在做的方式遍历每一行。因此,当句子多于一行时,它将拆分所有内容。

with open("test.txt") as f:
    data = f.readlines()
    data = [d.strip().split('\n') for d in data]

如何遍历此文件,并逐个获取每个句子,而不是遍历每一行?

4 个答案:

答案 0 :(得分:2)

在双换行符上进行拆分,例如:

with open("test.txt") as f:
    data = f.read()
    data = [d.strip().split('\n\n') for d in data]

答案 1 :(得分:2)

with open("test.txt") as f:
    text = f.read()
    for line in text.split("\n\n"):
        line = line.replace("\n", " ")
        print(line)

我想这就是你想要的。您可以将两个换行符分开,然后用空格替换换行符。

输出:

BEFORE
BEFORE THE
BEFORE THE PARLIAMENT
BEFORE THE PARLIAMENT ON
BEFORE THE PARLIAMENT ON BRITAIN'S
BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS
BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS WITH

答案 2 :(得分:1)

您可以用双换行符隔开

data = f.read().split('\n\n')

但是,必须确保空白行不包含任何字符(空格)。

答案 3 :(得分:0)

使用itertools.groupby的一个版本。这将适用于句子之间的任意数量的换行符:

from itertools import groupby

with open('file.txt', 'r') as f_in:
    txt = f_in.read()

out = []
for v, g in groupby(txt.splitlines(), lambda k: k != ''):
    if v:
        out.append(' '.join(g))


from pprint import pprint
pprint(out)

打印:

['BEFORE',
 'BEFORE THE',
 'BEFORE THE PARLIAMENT',
 'BEFORE THE PARLIAMENT ON',
 "BEFORE THE PARLIAMENT ON BRITAIN'S",
 "BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS",
 "BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS WITH"]