Question

我在下面的格式中有几句文字：

Title: Presenting in a new Forum, Jun-01-2016  # Sentence 1
Source: xyz Website                            # Sentence 2
Type: Special Presentations                    # Sentence 3
From: 14/May/2016                              # Sentence 4
blah blah blah blah                            # Main Paragraph (stretches over 150 words)

如何将它们分开来获得：

Title: Presenting in a new Forum, Jun-01-2016

和

Source: xyz Website

和

Type: Special Presentations

等

我已尝试过el2.split()，但这会将所有内容分成单个单词。我正在尝试将其转换为list，以便我可以单独绘制Sentences分别为Main Paragraph。

Answer 1

如果你总是要有标题，来源，类型和来自（每行1行），然后是一段任意行数：

splitted_file = string.splitlines()

title = splitted_file[1]
source = splitted_file[2]
type = splitted_file[3]
_from = splitted_file[4]  # can't use 'from' as a variable name
paragraph = '\n'.join(splitted_file[5:])

print(title)
>> Title: Presenting in a new Forum, Jun-01-2016

print(source)
>> Source: xyz Website

print(type)
>> Type: Special Presentations

print(_from)
>> From: 14/May/2016

print(paragraph)
>> blah blah blah blah

Answer 2

如果是来自文本文件，则只需调用.readlines（）即可返回行数组。如果它是一个字符串，则拆分为'\ n'。

Answer 3

假设每个句子的第一个单词以＆＃39;结尾：＆＃39;并且段落的第一行不会以＆＃39;：＆＃39;结尾，以下代码应该有效：

string = """\
Title: Presenting in a new Forum, Jun-01-2016  
Source: xyz Website                            
Type: Special Presentations                    
From: 14/May/2016                              
blah blah blah blah                            # Main Paragraph (stretches over 150 words)
"""

paragraph = ''
# when we start the paragraph, there are no more sentences
paragraph_start = False

for line in string.splitlines():
    if line.split()[0].endswith(':') and not paragraph_start:
        print('a Sentence:', line)
    else:
        paragraph_start = True
        paragraph += line + '\n'


print('the paragraph:', paragraph)

Answer 4

如果那是你想要使用的字符串：

el2.splitlines()

将在每个新行上拆分字符串，如果需要将换行符（\ n）添加到字符串中，则可以使用：

el2.splitlines(5)

如果el2是一个文件，你会想这样做：

>>> file = open('el2.txt').read()
>>> file
>>> file.split('\n')

这将再次拆分换行符的每一行。

最后，一旦你有了一个列表，你可能想要将它们作为单独的变量存储起来（当你有大型列表时不建议这样做）但你可以这样做

a = el2.split()
title = a[0]
source = a[1]

以下是http://pythonfiddle.com/split-and-save

的PythonFiddle

如何将文本拆分为单个列表项

4 个答案: