Question

我以dict元组的格式查询结果集数据。我想根据具体情况将数据分组为dict元组的元组。

实际输出：

({'col1': 2014},
 {'col1': 2013},
 {'col1': 2014},
 {'col1': 2013},
 {'col1': 2015},
 {'col2': '24'})

预期产出：这里我们根据年份进行分组

(({'col1': 2014}, {'col1': 2014}),
 ({'col1': 2013}, {'col1': 2013}),
 ({'col1': 2015}, {'col2': '24'}))

请指导我们获取数据，同时我们正在处理查询，而不是逐个处理记录并转换为特定的格式。

Answer 1

您可以根据年份对dicts进行排序，然后将groupby与年份key一起使用：

>>> from itertools import groupby
>>> t = ({'col1':2014},{'col1':2013},{'col1':2014},{'col1':2013},{'col1':2015})
>>> key = lambda x: x['col1']
>>> tuple(tuple(g) for k, g in groupby(sorted(t, key=key), key))
(({'col1': 2013}, {'col1': 2013}), ({'col1': 2014}, {'col1': 2014}), ({'col1': 2015},))

groupby将使用相同的键对连续元素进行分组，并返回(key, iterable)元组。然后将每个iterable转换为生成器表达式中的元组，该表达式作为参数提供给tuple。

更新：上面的单行有 O（n log n）时间复杂度，因为它对数据进行排序。通过使用defaultdict，可以使用更多行来完成任务 O（n）时间：

>>> from collections import defaultdict
>>> t = ({'col1':2014},{'col1':2013},{'col1':2014},{'col1':2013},{'col1':2015})
>>> dd = defaultdict(list)
>>> for d in t:
...     dd[d['col1']].append(d)
...
>>> tuple(tuple(v) for k, v in dd.items())
(({'col1': 2013}, {'col1': 2013}), ({'col1': 2014}, {'col1': 2014}),({'col1': 2015},))

请注意，这将以任意顺序返回组，因为dict是无序集合。如果您需要处理＆＃34;完整＆＃34;组（每年只有一个组），您无法让DB按排序顺序返回数据，这是您可以做的最好的。

如果您可以按批量顺序从批次中获取数据，那么您仍然可以使用groupby而无需在之前提取所有内容：

from itertools import groupby

cursor = iter([2013, 2013, 2014, 2014, 2014, 2015, 2015])

def get_batch():
    batch = []
    try:
        for _ in range(3):
            batch.append({'col1': next(cursor)})
    except StopIteration:
        pass

    print('Got batch')
    return batch

def fetch():
    while True:
        batch = get_batch()
        if not batch:
            break

        yield from batch

for k, g in groupby(fetch(), lambda x: x['col1']):
    print('Group: {}'.format(tuple(g)))

输出：

Got batch
Group: ({'col1': 2013}, {'col1': 2013})
Got batch
Group: ({'col1': 2014}, {'col1': 2014}, {'col1': 2014})
Got batch
Got batch
Group: ({'col1': 2015}, {'col1': 2015})

在python中将dict的元组转换为dict元组的元组

1 个答案: