Python中的分隔符分块列表

时间:2013-01-29 22:43:23

标签: python list parsing list-comprehension list-manipulation

以下形式列表的当前方法是什么:["record_a:", "x"*N, "record_b:", "y"*M, ...],即每个记录的开头由以“:”结尾的字符串表示的列表,并包括所有元素直到下一个记录。所以以下列表:

["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]

将分为:

[["record_a", "a", "b"], ["record_b", "1", "2", "3", "4"]]

该列表包含任意数量的记录,每个记录包含任意数量的列表项(直到下一个记录开始或没有更多记录时为止)。如何有效地完成此操作?

4 个答案:

答案 0 :(得分:4)

lst = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
out = []
for x in lst:
    if x[-1] == ':':
        out.append([x])
    else:
        out[-1].append(x)

答案 1 :(得分:4)

使用发电机:

def chunkRecords(records):
    record = []
    for r in records:
        if r[-1] == ':':
            if record:
                yield record
            record = [r[:-1]]
        else:
            record.append(r)
    if record:
        yield record 

然后循环:

for record in chunkRecords(records):
    # record is a list

或再次转入列表:

records = list(chunkRecords(records))

后者导致:

>>> records = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> records = list(chunkRecords(records))
>>> records
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]

答案 2 :(得分:1)

好的,这是我工作日结束时疯狂的itertools解决方案:

>>> from itertools import groupby, count
>>> d = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]
>>> groups = (list(g) for _, g in groupby(d, lambda x: x.endswith(":")))
>>> git = iter(groups)
>>> paired = ((next(git), next(git)) for _ in count())
>>> combined = [ [a[0][:-1]] + b for a,b in paired]
>>> 
>>> combined
[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]

(作为一个人可以做的事情的例子,而不是作为我必然使用的一段代码。)

答案 3 :(得分:1)

from itertools import groupby,izip,chain

l = ["record_a:", "a", "b", "record_b:", "1", "2", "3", "4"]

[list(chain([x[0][0].strip(':')], x[1])) for x in izip(*[(list(g) 
            for _,g in groupby(l,lambda x: x.endswith(':')))]*2)]

出:

[['record_a', 'a', 'b'], ['record_b', '1', '2', '3', '4']]