这个问题已经被提出了许多变种,但我还没有找到一套具有这一标准的人。
我正在从文件中读取行到列表中。我想创建以" 0"开头的行开头的子列表并以" TR ID"。
开头的行结束例如:
lines = ['0 01 31DEC', '18:19:08', 'TR ID: 308', '0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308', '0 03 31DEC', '18:19:08', 'TR ID: 308', '0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']
期望的结果是:
desired_result = [['0 01 31DEC', '18:19:08', 'TR ID: 308'], ['0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308'], ['0 03 31DEC', '18:19:08', 'TR ID: 308'], ['0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']]
我已经尝试获取每个开始和结束行的索引以便进行一些加入,但这看起来很笨拙。有没有更好的方法呢?
更新:
这些都是很好的方向,我已经取得了一些进展,但我应该提到每个所需的子列表之间有一些不相关的行块,因此需要指定一个开头线,而不仅仅是一个结束线。在没有指定起始行的情况下,一些子列表会在子列表的开头挖掘不相关的行。
答案 0 :(得分:1)
尝试迭代:
outlist = []; templist = []
for i in lines:
if i.startswith("TR ID"): # ending criteria alone seems sufficient for this data
templist.append(i)
outlist.append(templist)
templist = []
else:
templist.append(i)
for o in outlist: # see created list
print(o)
输出:
['0 01 31DEC', '18:19:08', 'TR ID: 308']
['0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']
['0 03 31DEC', '18:19:08', 'TR ID: 308']
['0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']
答案 1 :(得分:1)
这是一个发电机的好地方:
def segment(lines):
out = []
for line in lines:
out.append(line)
if line.startswith('TR ID'):
yield out
out = []
答案 2 :(得分:0)
lines = ['0 01 31DEC', '18:19:08', 'TR ID: 308', '0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308', '0 03 31DEC',
'18:19:08', 'TR ID: 308', '0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']
indices = [i for i in range(len(lines)) if lines[i][0] == '0'] + [-1] # [-1] for the last element
result = [lines[indices[i]:indices[i + 1]] for i in range(len(indices) - 1)]
答案 3 :(得分:0)
我想采用递归方法:
lines = ['0 01 31DEC', '18:19:08', 'TR ID: 308', '0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308', '0 03 31DEC', '18:19:08', 'TR ID: 308', '0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']
final_result=[]
def hello(lines_1):
data=[]
if not lines_1:
return 0
else:
for j,i in enumerate(lines_1):
if i.startswith('TR ID'):
data.append(i)
final_result.append(data)
return hello(lines_1[j+1:])
else:
data.append(i)
hello(lines)
print(final_result)
输出:
[['0 01 31DEC', '18:19:08', 'TR ID: 308'], ['0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308'], ['0 03 31DEC', '18:19:08', 'TR ID: 308'], ['0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']]