我有一个.txt文件,其中包含4个文本,我想创建一个列表,其中所有for文本都会出现在新行上-因此,我在列表中将有4个对象。代码应该说些什么:逐行阅读文本(但是将行添加到文档中),但是一旦您获得“ x的1个文档”,就要开始新的一行。我已经尝试了以下方法,但不能创建我想要的东西:
with open('testfile.txt') as f:
myList = f.readlines()
myList = [x.strip() for x in content]
testfile.txt
1 doc of 4
Hello World.
This is another question
2 doc of 4
This is a new text file.
Not much in it.
3 doc of 4
This is the third text.
It contains separate info.
4 doc of 4
The final text.
A short one.
myList的预期输出:
myList=['Hello World. This is another question',
'This is a new text file. Not much in it.',
'This is the third text. It contains separate info.',
'The final text. A short one.']
答案 0 :(得分:0)
好的。
类似的事情会发生–但是,如果文档不是以标题行开头,则会 崩溃。
import re
# This will hold each document as a list of lines.
# To begin with, there are no documents.
myList = []
# Define a regular expression to match header lines.
header_line_re = re.compile(r'\d+ doc of \d+')
with open('testfile.txt') as f:
for line in f: # For each line...
line = line.strip() # Remove leading and trailing whitespace
if header_line_re.match(line): # If the line matches the header line regular expression...
myList.append([]) # Start a new group within `myList`,
continue # then skip processing the line further.
if line: # If the line is not empty, simply add it to the last group.
myList[-1].append(line)
# Recompose the lines back to strings (separated by spaces, not newlines).
myList = [' '.join(doc) for doc in myList]
print(myList)
输出为:
[
"Hello World. This is another question",
"This is a new text file. Not much in it.",
"This is the third text. It contains separate info.",
"The final text. A short one.",
]