我在下面的格式中有几句文字:
Title: Presenting in a new Forum, Jun-01-2016 # Sentence 1
Source: xyz Website # Sentence 2
Type: Special Presentations # Sentence 3
From: 14/May/2016 # Sentence 4
blah blah blah blah # Main Paragraph (stretches over 150 words)
如何将它们分开来获得:
Title: Presenting in a new Forum, Jun-01-2016
和
Source: xyz Website
和
Type: Special Presentations
等
我已尝试过el2.split()
,但这会将所有内容分成单个单词。我正在尝试将其转换为list
,以便我可以单独绘制Sentences
分别为Main Paragraph
。
答案 0 :(得分:2)
如果你总是要有标题,来源,类型和来自(每行1行),然后是一段任意行数:
splitted_file = string.splitlines()
title = splitted_file[1]
source = splitted_file[2]
type = splitted_file[3]
_from = splitted_file[4] # can't use 'from' as a variable name
paragraph = '\n'.join(splitted_file[5:])
print(title)
>> Title: Presenting in a new Forum, Jun-01-2016
print(source)
>> Source: xyz Website
print(type)
>> Type: Special Presentations
print(_from)
>> From: 14/May/2016
print(paragraph)
>> blah blah blah blah
答案 1 :(得分:0)
如果是来自文本文件,则只需调用.readlines()即可返回行数组。如果它是一个字符串,则拆分为'\ n'。
答案 2 :(得分:0)
假设每个句子的第一个单词以'结尾:'并且段落的第一行不会以':'结尾,以下代码应该有效:
string = """\
Title: Presenting in a new Forum, Jun-01-2016
Source: xyz Website
Type: Special Presentations
From: 14/May/2016
blah blah blah blah # Main Paragraph (stretches over 150 words)
"""
paragraph = ''
# when we start the paragraph, there are no more sentences
paragraph_start = False
for line in string.splitlines():
if line.split()[0].endswith(':') and not paragraph_start:
print('a Sentence:', line)
else:
paragraph_start = True
paragraph += line + '\n'
print('the paragraph:', paragraph)
答案 3 :(得分:0)
如果那是你想要使用的字符串:
el2.splitlines()
将在每个新行上拆分字符串,如果需要将换行符(\ n)添加到字符串中,则可以使用:
el2.splitlines(5)
如果el2是一个文件,你会想这样做:
>>> file = open('el2.txt').read()
>>> file
>>> file.split('\n')
这将再次拆分换行符的每一行。
最后,一旦你有了一个列表,你可能想要将它们作为单独的变量存储起来(当你有大型列表时不建议这样做)但你可以这样做
a = el2.split()
title = a[0]
source = a[1]
的PythonFiddle