如何在文件中使用itertools.groupby

时间:2017-05-08 04:25:47

标签: python

我有一个文件,如下所示。我计划使用itertools.groupby创建一个包含块数据的列表列表,但我很难找出将这些行拆分成列表块的关键部分。

有什么想法吗?

with open(infile) as f:
    blocks = []
    for key, val in itertools.groupby(f, lambda x:):
        if key:
            blocks.append(list(val))

输入:

Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : fabric
DataFields        : Zen
Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : application
DataFields        : Flood1
Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : fabric
DataFields        : Flood2
Timestamp         : 2017-02-17 06:41:33.163000 EST
Event             : application
DataFields        : Flood3 

输出: 应该是列表清单

[list1,list2,list3,list4]

list1 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : fabric, DataFields        : Zen]
list2 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : application, DataFields        : Flood1]
list3 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : fabric, DataFields        : Flood2]
list4 = [Timestamp         : 2017-02-17 06:41:33.163000 EST, Event             : application, DataFields        : Flood3]

2 个答案:

答案 0 :(得分:0)

如果使用groupby来关闭Timestamp,它将在生成时间戳行和非时间戳行的生成器之间交替。您可以使用它来创建新的子列表,并使用包含的数据扩展它们。

import itertools

with open('test.txt') as f:
    blocks = []
    for is_timestamp, lines in itertools.groupby(
            (line.strip() for line in f), 
            lambda line: line.startswith('Timestamp')):
        if is_timestamp:
            # saw a timestamp - start a new inner list
            blocks.append(list(lines))
        else:
            # extend with not timestamp stuff
            blocks[-1].extend(list(lines))

for block in blocks:
    print(block)

运行测试,我得到了

td@mintyfresh ~/tmp $ python3 test.py
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : fabric', 'DataFields        : Zen']
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : application', 'DataFields        : Flood1']
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : fabric', 'DataFields        : Flood2']
['Timestamp         : 2017-02-17 06:41:33.163000 EST', 'Event             : application', 'DataFields        : Flood3', '']

答案 1 :(得分:0)

使用itertools' chainislice方法:

>>> def get_chunks(gdata, stop=2):
...     iterator = iter(gdata)
...     for first in iterator:
...         yield itertools.chain([first], itertools.islice(iterator, stop))
...
>>> with open('infile') as f:
...     blocks = []
...     for item in get_chunks(f):
...         blocks.append(list(item))