我有以下格式的文档,例如,我想用python对其进行分类
Outline:
1. Lorem Ipsum
2. Lorem Ipsum
Preface:
This is sample generated words of the documents
必须将其分类为数组,例如
[Outline: 1. Lorem Ipsum 2. Lorem Ipsum, Preface: This is sample generated words of the documents ]
或存储在其他变量中,例如
outline = segment_by_word("outline")
preface = segment_by_word("preface")
print(preface ) #This is sample generated words of the documents
答案 0 :(得分:0)
我假设只有Ouline
和Preface
两类。下面的代码将这些行作为元组添加到列表中,其中包含行号和行信息
lines_by_category = {'Outline': [], 'Preface': []}
category = None
count = 0
for line in lines: # Assuming you know how to get to the point of reading lines
if line.find(r'Outline:'):
category = 'Outline'
elif line.find(r'Preface:'):
category = 'Preface'
category_list = lines_by_category[category]
category_list.append((count, line)) # Updates the original list because it is pointing to the same one