我有一个文件,如下所示。我计划使用itertools.groupby创建一个包含块数据的列表列表,但我很难找出将这些行拆分成列表块的关键部分。
有什么想法吗?
with open(infile) as f:
blocks = []
for key, val in itertools.groupby(f, lambda x:):
if key:
blocks.append(list(val))
输入:
Timestamp : 2017-02-17 06:41:33.163000 EST
Event : fabric
DataFields : Zen
Timestamp : 2017-02-17 06:41:33.163000 EST
Event : application
DataFields : Flood1
Timestamp : 2017-02-17 06:41:33.163000 EST
Event : fabric
DataFields : Flood2
Timestamp : 2017-02-17 06:41:33.163000 EST
Event : application
DataFields : Flood3
输出: 应该是列表清单
[list1,list2,list3,list4]
list1 = [Timestamp : 2017-02-17 06:41:33.163000 EST, Event : fabric, DataFields : Zen]
list2 = [Timestamp : 2017-02-17 06:41:33.163000 EST, Event : application, DataFields : Flood1]
list3 = [Timestamp : 2017-02-17 06:41:33.163000 EST, Event : fabric, DataFields : Flood2]
list4 = [Timestamp : 2017-02-17 06:41:33.163000 EST, Event : application, DataFields : Flood3]
答案 0 :(得分:0)
如果使用groupby
来关闭Timestamp
,它将在生成时间戳行和非时间戳行的生成器之间交替。您可以使用它来创建新的子列表,并使用包含的数据扩展它们。
import itertools
with open('test.txt') as f:
blocks = []
for is_timestamp, lines in itertools.groupby(
(line.strip() for line in f),
lambda line: line.startswith('Timestamp')):
if is_timestamp:
# saw a timestamp - start a new inner list
blocks.append(list(lines))
else:
# extend with not timestamp stuff
blocks[-1].extend(list(lines))
for block in blocks:
print(block)
运行测试,我得到了
td@mintyfresh ~/tmp $ python3 test.py
['Timestamp : 2017-02-17 06:41:33.163000 EST', 'Event : fabric', 'DataFields : Zen']
['Timestamp : 2017-02-17 06:41:33.163000 EST', 'Event : application', 'DataFields : Flood1']
['Timestamp : 2017-02-17 06:41:33.163000 EST', 'Event : fabric', 'DataFields : Flood2']
['Timestamp : 2017-02-17 06:41:33.163000 EST', 'Event : application', 'DataFields : Flood3', '']
答案 1 :(得分:0)
使用itertools' chain
和islice
方法:
>>> def get_chunks(gdata, stop=2):
... iterator = iter(gdata)
... for first in iterator:
... yield itertools.chain([first], itertools.islice(iterator, stop))
...
>>> with open('infile') as f:
... blocks = []
... for item in get_chunks(f):
... blocks.append(list(item))