我有一个类似的data.txt文件:
<<a
<<t This is a title 01
/t>>
<<c
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
/c>>
/a>>
<<a
<<t This is a title 02
/t>>
<<c
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
/c>>
/a>>
我想读取文件并将每个句子拆分为一个列表,例如:
[[This is a title 01],[This is a sentence.],[This is a sentence.]...[This is a title 02],[This is a sentence.]...]
预先感谢您的帮助。
答案 0 :(得分:0)
您可以尝试以下-
result = []
with open('data.txt', 'r') as f:
for line in f:
if "This is a title" in line:
cleaned_line = line.lstrip('<<t').strip()
result.append(cleaned_line)
elif line.startswith("This is a sentence"):
sentence_list = line.split('.')
for _ in sentence_list:
result.append(_)
这是如何工作的?
打开文件,逐行进行迭代。
提取标题。去除<<t
和空格。
要提取句子,只需将句点(。)处的行字符串拆分。然后将所有内容附加到result
列表中。
编辑:
注意:您最终将有一个字符串列表。由于您是Python的新手,因此我将其保留为练习,以帮助您将字符串列表转换为列表列表。它应该非常简单。