Question

给出以下列表：

[
    ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)),
    ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0)),
    ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)),
    ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)),
    ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0))
]

我想通过元组中的第一，第二，第四和第五列对它们进行分组，并将第三列相加。对于此示例，我将列命名为col1，col2，col3，col4，col5。

在SQL中我会做这样的事情：

select col1, col2, sum(col3), col4, col5 from my table
group by col1, col2, col4, col5

这样做是否有“酷”方式，还是手动循环？

Answer 1

你想要itertools.groupby。

请注意groupby期望对输入进行排序，因此您可能需要事先做到：

keyfunc = lambda t: (t[0], t[1], t[3], t[4])
data.sort(key=keyfunc)
for key, rows in itertools.groupby(data, keyfunc):
    print key, sum(r[2] for r in rows)

Answer 2

>>> [(x[0:2] + (sum(z[2] for z in y),) + x[2:5]) for (x, y) in
      itertools.groupby(sorted(L, key=operator.itemgetter(0, 1, 3, 4)),
      key=operator.itemgetter(0, 1, 3, 4))]
[
  ('A', '', Decimal('21.0000000000'), 1330, datetime.datetime(2012, 5, 14, 0, 0)),
  ('A', '', Decimal('4.0000000000'), 1330, datetime.datetime(2012, 6, 8, 0, 0)),
  ('AA', 'C', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 5, 31, 0, 0)),
  ('B', '', Decimal('7.0000000000'), 1330, datetime.datetime(2012, 5, 24, 0, 0)),
  ('B', '', Decimal('31.0000000000'), 1330, datetime.datetime(2012, 6, 4, 0, 0))
]

（注意：输出重新格式化）

Answer 3

如果您发现自己在使用大型数据集时会做很多事情，那么您可能需要查看pandas库，它有很多很好的工具来完成这类工作。

Python - 分组并汇总元组列表

3 个答案: