给出索引开始拆分python列表

时间:2016-09-30 02:57:32

标签: python string list

我看过这个:Split list into sublist based on index ranges

但我的问题略有不同。 我有一个清单

body

我需要根据日期将其拆分为子列表。基本上它是一个事件日志,但是由于糟糕的数据库设计,系统将事件的单独更新消息串联成一个大的字符串列表。 我有:

p

我的例子将给出:

i

现在我需要根据索引将列表拆分为单独的列表。所以对于我的例子,理想情况下我想得到:

DOM

所以格式为:

i

还有一些边缘情况,其中没有日期字符串,其格式为:

i

2 个答案:

答案 0 :(得分:1)

您根本不需要执行两次通过分组,因为您可以在一次通过中使用itertools.groupby按日期及其相关事件进行细分。通过避免计算索引然后使用它们切片list,您可以处理一次提供一个值的生成器,如果您的输入很大,则可以避免内存问题。为了演示,我已经拍摄了原始的List并对其进行了扩展,以便正确显示处理边缘情况:

import re

from itertools import groupby

List = ['undated', 'garbage', 'then', 'twodates', '2015-12-31',
        '2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report',
        '2016-01-03']

datere = re.compile(r"\d+\-\d+\-\d+")  # Precompile regex for speed
def group_by_date(it):
    # Make iterator that groups dates with dates and non-dates with dates
    grouped = groupby(it, key=lambda x: datere.match(x) is not None)
    for isdate, g in grouped:
        if not isdate:
            # We had a leading set of undated events, output as undated
            yield ['', list(g)]
        else:
            # At least one date found; iterate with one loop delay
            # so final date can have events included (all others have no events)
            lastdate = next(g)
            for date in g:
                yield [lastdate, []]
                lastdate = date

            # Final date pulls next group (which must be events or the end of the input)
            try:
                # Get next group of events
                events = list(next(grouped)[1])
            except StopIteration:
                # There were no events for final date
                yield [lastdate, []]
            else:
                # There were events associated with final date
                yield [lastdate, events]

print(list(group_by_date(List)))

输出(为了便于阅读而添加了新行):

[['', ['undated', 'garbage', 'then', 'twodates']],
 ['2015-12-31', []],
 ['2016-01-01', ['stuff happened', 'details']],
 ['2016-01-02', ['more stuff happened', 'details', 'report']],
 ['2016-01-03', []]]

答案 1 :(得分:1)

尝试:

def split_by_date(arr, patt='\d+\-\d+\-\d+'):
    results = []
    srch = re.compile(patt)
    rec = ['', []]
    for item in arr:
        if srch.match(item):
            if rec[0] or rec[1]:
                results.append(rec)
            rec = [item, []]
        else:
            rec[1].append(item)
    if rec[0] or rec[1]:
        results.append(rec)
    return results

然后:

normal_case = ['2016-01-01', 'stuff happened', 'details', 
               '2016-01-02', 'more stuff happened', 'details', 'report']
special_case_1 = ['blah', 'blah', 'stuff', '2016-11-11']
special_case_2 = ['blah', 'blah', '2015/01/01', 'blah', 'blah']

print(split_by_date(normal_case))
print(split_by_date(special_case_1))
print(split_by_date(special_case_2, '\d+\/\d+\/\d+'))