Python:将不一致的列表解析为列表列表,设置每个子列表的第一个和最后一个项目

时间:2018-01-19 04:07:43

标签: python list

这个问题已经被提出了许多变种,但我还没有找到一套具有这一标准的人。

我正在从文件中读取行到列表中。我想创建以" 0"开头的行开头的子列表并以" TR ID"。

开头的行结束

例如:

lines = ['0 01 31DEC', '18:19:08', 'TR ID: 308', '0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308', '0 03 31DEC', '18:19:08', 'TR ID: 308', '0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']

期望的结果是:

desired_result = [['0 01 31DEC', '18:19:08', 'TR ID: 308'], ['0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308'], ['0 03 31DEC', '18:19:08', 'TR ID: 308'], ['0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']]

我已经尝试获取每个开始和结束行的索引以便进行一些加入,但这看起来很笨拙。有没有更好的方法呢?

更新:

这些都是很好的方向,我已经取得了一些进展,但我应该提到每个所需的子列表之间有一些不相关的行块,因此需要指定一个开头线,而不仅仅是一个结束线。在没有指定起始行的情况下,一些子列表会在子列表的开头挖掘不相关的行。

4 个答案:

答案 0 :(得分:1)

尝试迭代:

outlist = []; templist = []
for i in lines:
    if i.startswith("TR ID"):   # ending criteria alone seems sufficient for this data
        templist.append(i)
        outlist.append(templist)
        templist = []
    else:
        templist.append(i)

for o in outlist:  # see created list
    print(o)

输出:

['0 01 31DEC', '18:19:08', 'TR ID: 308']
['0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']
['0 03 31DEC', '18:19:08', 'TR ID: 308']
['0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']

答案 1 :(得分:1)

这是一个发电机的好地方:

def segment(lines):
    out = []
    for line in lines:
        out.append(line) 
        if line.startswith('TR ID'):
            yield out
            out = []

答案 2 :(得分:0)

lines = ['0 01 31DEC', '18:19:08', 'TR ID: 308', '0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308', '0 03 31DEC',
         '18:19:08', 'TR ID: 308', '0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']

indices = [i for i in range(len(lines)) if lines[i][0] == '0'] + [-1] # [-1] for the last element
result = [lines[indices[i]:indices[i + 1]] for i in range(len(indices) - 1)]

答案 3 :(得分:0)

我想采用递归方法:

lines = ['0 01 31DEC', '18:19:08', 'TR ID: 308', '0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308', '0 03 31DEC', '18:19:08', 'TR ID: 308', '0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']

final_result=[]
def hello(lines_1):
    data=[]



    if not lines_1:
        return 0
    else:
        for j,i in enumerate(lines_1):
            if i.startswith('TR ID'):
                data.append(i)

                final_result.append(data)



                return hello(lines_1[j+1:])

            else:
                data.append(i)

hello(lines)

print(final_result)

输出:

[['0 01 31DEC', '18:19:08', 'TR ID: 308'], ['0 02 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308'], ['0 03 31DEC', '18:19:08', 'TR ID: 308'], ['0 04 31DEC', '18:19:08', 'ATC ID: 21232', 'TR ID: 308']]