如何将文本拆分为单个列表项

时间:2016-07-08 16:26:16

标签: python

我在下面的格式中有几句文字:

Title: Presenting in a new Forum, Jun-01-2016  # Sentence 1
Source: xyz Website                            # Sentence 2
Type: Special Presentations                    # Sentence 3
From: 14/May/2016                              # Sentence 4
blah blah blah blah                            # Main Paragraph (stretches over 150 words)

如何将它们分开来获得:

Title: Presenting in a new Forum, Jun-01-2016

Source: xyz Website

Type: Special Presentations

我已尝试过el2.split(),但这会将所有内容分成单个单词。我正在尝试将其转换为list,以便我可以单独绘制Sentences分别为Main Paragraph

4 个答案:

答案 0 :(得分:2)

如果你总是要有标题,来源,类型和来自(每行1行),然后是一段任意行数:

splitted_file = string.splitlines()

title = splitted_file[1]
source = splitted_file[2]
type = splitted_file[3]
_from = splitted_file[4]  # can't use 'from' as a variable name
paragraph = '\n'.join(splitted_file[5:])

print(title)
>> Title: Presenting in a new Forum, Jun-01-2016

print(source)
>> Source: xyz Website

print(type)
>> Type: Special Presentations

print(_from)
>> From: 14/May/2016

print(paragraph)
>> blah blah blah blah

答案 1 :(得分:0)

如果是来自文本文件,则只需调用.readlines()即可返回行数组。如果它是一个字符串,则拆分为'\ n'。

答案 2 :(得分:0)

假设每个句子的第一个单词以'结尾:'并且段落的第一行不会以':'结尾,以下代码应该有效:

string = """\
Title: Presenting in a new Forum, Jun-01-2016  
Source: xyz Website                            
Type: Special Presentations                    
From: 14/May/2016                              
blah blah blah blah                            # Main Paragraph (stretches over 150 words)
"""

paragraph = ''
# when we start the paragraph, there are no more sentences
paragraph_start = False

for line in string.splitlines():
    if line.split()[0].endswith(':') and not paragraph_start:
        print('a Sentence:', line)
    else:
        paragraph_start = True
        paragraph += line + '\n'


print('the paragraph:', paragraph)

答案 3 :(得分:0)

如果那是你想要使用的字符串:

el2.splitlines()

将在每个新行上拆分字符串,如果需要将换行符(\ n)添加到字符串中,则可以使用:

el2.splitlines(5)

如果el2是一个文件,你会想这样做:

>>> file = open('el2.txt').read()
>>> file
>>> file.split('\n')

这将再次拆分换行符的每一行。

最后,一旦你有了一个列表,你可能想要将它们作为单独的变量存储起来(当你有大型列表时不建议这样做)但你可以这样做

a = el2.split()
title = a[0]
source = a[1]

以下是http://pythonfiddle.com/split-and-save

的PythonFiddle