在阅读python文档时,我遇到了itertools.groupby()
功能。这不是很简单所以我决定在stackoverflow上查找一些信息。我从How do I use Python's itertools.groupby()?找到了一些东西。
在这里和文档中似乎没有关于它的信息,所以我决定发表我的意见征求意见。
由于
答案 0 :(得分:11)
首先,您可以阅读文档here。
我会把我认为最重要的一点放在第一位。我希望在这些例子之后,原因会变得清晰。
总是将相同的项目分类,以便进行分组以避免意外结果
itertools.groupby(iterable, key=None or some func)
获取可迭代列表并根据指定的键对它们进行分组。键指定要应用于每个可迭代的操作,然后将其结果用作每个对项目进行分组的标题;最终拥有相同关键字的项目'价值将最终出现在同一组中。
返回值是一个类似于字典的可迭代,因为它的格式为{key : value}
。
示例1
# note here that the tuple counts as one item in this list. I did not
# specify any key, so each item in the list is a key on its own.
c = groupby(['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
结果
{1: [1, 1],
'goat': ['goat'],
3: [3],
'cow': ['cow'],
('persons', 'man', 'woman'): [('persons', 'man', 'woman')],
10: [10],
11: [11],
2: [2],
'dog': ['dog']}
示例2
# notice here that mulato and camel don't show up. only the last element with a certain key shows up, like replacing earlier result
# the last result for c actually wipes out two previous results.
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
c = groupby(list_things, key=lambda x: x[0])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
结果
{'c': ['camel'],
'd': ['dog', 'donkey'],
'g': ['goat'],
'm': ['mongoose', 'malloo'],
'persons': [('persons', 'man', 'woman')],
'w': ['wombat']}
现在是排序版本
# but observe the sorted version where I have the data sorted first on same key I used for grouping
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
sorted_list = sorted(list_things, key = lambda x: x[0])
print(sorted_list)
print()
c = groupby(sorted_list, key=lambda x: x[0])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
结果
['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat']
{'c': ['cow', 'cat', 'camel'],
'd': ['dog', 'donkey'],
'g': ['goat'],
'm': ['mulato', 'mongoose', 'malloo'],
'persons': [('persons', 'man', 'woman')],
'w': ['wombat']}
示例3
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "harley"), \
("vehicle", "speed boat"), ("vehicle", "school bus")]
dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
dic[key] = list(group)
dic
结果
{'animal': [('animal', 'bear'), ('animal', 'duck')],
'plant': [('plant', 'cactus')],
'vehicle': [('vehicle', 'harley'),
('vehicle', 'speed boat'),
('vehicle', 'school bus')]}
现在为排序版本。我在这里将元组更改为列表。无论如何都有相同的结果。
things = [["animal", "bear"], ["animal", "duck"], ["vehicle", "harley"], ["plant", "cactus"], \
["vehicle", "speed boat"], ["vehicle", "school bus"]]
dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
dic[key] = list(group)
dic
结果
{'animal': [['animal', 'bear'], ['animal', 'duck']],
'plant': [['plant', 'cactus']],
'vehicle': [['vehicle', 'harley'],
['vehicle', 'speed boat'],
['vehicle', 'school bus']]}
答案 1 :(得分:5)
一如既往,documentation of the function应该是第一个检查的地方。但是itertools.groupby
肯定是最棘手的itertools
之一,因为它有一些可能的陷阱:
如果项目的key
- 结果与后续项目相同,则只对项目进行分组:
from itertools import groupby
for key, group in groupby([1,1,1,1,5,1,1,1,1,4]):
print(key, list(group))
# 1 [1, 1, 1, 1]
# 5 [5]
# 1 [1, 1, 1, 1]
# 4 [4]
之前可以使用sorted
- 如果想要整体groupby
。
它产生两个项目,第二个是迭代器(因此需要迭代第二个项目!)。我明确需要将它们转换为前一个示例中的list
。
如果前进groupby
- 迭代器,则丢弃第二个产生的元素:
it = groupby([1,1,1,1,5,1,1,1,1,4])
key1, group1 = next(it)
key2, group2 = next(it)
print(key1, list(group1))
# 1 []
即使group1
不为空!
如前所述,可以使用sorted
进行整体groupby
操作,但效率极低(如果要在生成器上使用groupby,则会丢失内存效率)。如果您不能保证输入为sorted
(也不需要O(n log(n))
排序时间开销),则有更好的替代方案可用:
然而,检查当地的房产非常棒。 itertools
-recipes section中有两个食谱:
def all_equal(iterable):
"Returns True if all the elements are equal to each other"
g = groupby(iterable)
return next(g, True) and not next(g, False)
和
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return map(next, map(itemgetter(1), groupby(iterable, key)))