说我有以下数据:
data = [['John', 1], ['Ada', 2], ['Ada', 3], ['Paul', 4],
['Paul', 5], ['Paul', 6], ['Kat', 7], ['Kat', 8]]
我可以按groupby
:
In [37]:
from itertools import groupby, izip_longest
from operator import itemgetter
for name, g in groupby(data, key=itemgetter(0)):
print name, list(g)
John [['John', 1]]
Ada [['Ada', 2], ['Ada', 3]]
Paul [['Paul', 4], ['Paul', 5], ['Paul', 6]]
Kat [['Kat', 7], ['Kat', 8]]
我还可以使用recipe tools' grouper对每两个条目进行分组。我将复制/粘贴它以供参考:
In [38]:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
for g in grouper(data, 2):
print g
(['John', 1], ['Ada', 2])
(['Ada', 3], ['Paul', 4])
(['Paul', 5], ['Paul', 6])
(['Kat', 7], ['Kat', 8])
但是现在,我想迭代数据,使第一个元素包含John和Ada的数据,第二个元素包含Paul和Kat的数据。换句话说,我想像这样结合groupby
和grouper
:
In [39]:
person_iterator = groupby(data, key=itemgetter(0))
for group_iterator in grouper(person_iterator, 2):
print [(keyvalue[0], list(keyvalue[1])) for keyvalue in group_iterator]
但输出并非我的预期:
[('John', []), ('Ada', [['Ada', 2], ['Ada', 3]])]
[('Paul', []), ('Kat', [['Kat', 7], ['Kat', 8]])]
为什么John和Paul有空列表?如何解决?
答案 0 :(得分:0)
iterator
产生的group_iterator[1]
itertools.groupby
在iterator
屈服时会耗尽。
您需要将迭代器转换为序列,然后再将其传递给grouper
以防止:
person_iterator = ((key, list(grp)) for key, grp in groupby(data, key=itemgetter(0)))
for group_iterator in grouper(person_iterator, 2):
print [(key, value) for key, value in group_iterator]
输出:
[('John', [['John', 1]]), ('Ada', [['Ada', 2], ['Ada', 3]])]
[('Paul', [['Paul', 4], ['Paul', 5], ['Paul', 6]]), ('Kat', [['Kat', 7], ['Kat', 8]])]