我想根据每个列表中的最后一个元素将所有列表分组为一个元组,并且还要计算最后一个元素出现的次数。但是我发现的挑战是元组中的所有列表都可以具有不同的大小。
例如输入
[['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
我正在尝试使输出为
('a', 2, [('dd','ee'),('ff', 'gg', 'hh')]), ( 'b', 2, [('aa'), ('cc')]), ( 'c', 1, [('bb')])
最后,我想继续并将其转换为panda-dataframe格式。如果有人可以帮助/指导,将不胜感激。
答案 0 :(得分:1)
可读版本
mylist.sort(key=operator.itemgetter(-1)) # sort by last element
result = []
for k, g in itertools.groupby(mylist, key=operator.itemgetter(-1)):
# remove last element from each sublist:
g = [tuple(sublist[:-1]) for sublist in g]
result.append((k, len(g), g))
答案 1 :(得分:0)
不导入库
list = [['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
instances = {}
for sublist in list:
leading_elements, last_element = sublist[:-1], sublist[-1]
instances.setdefault(last_element, [])
instances[last_element].append(tuple(leading_elements))
result = tuple()
for key, val in instances.items():
result += (key, len(val), val)
答案 2 :(得分:-1)
使用itertools.groupby
>>> from itertools import groupby
>>> l = [['aa', 'b'], ['bb', 'c'], ['cc', 'b'], ['dd','ee','a'], ['ff', 'gg', 'hh', 'a']]
>>>
>>> f = lambda sl: sl[-1]
>>> res = [(k, [tuple(sl[:-1]) for sl in v]) for k,v in groupby(sorted(l, key=f), f)]
>>> res = [(k, len(v), v) for k,v in res]
>>> print(res)
[('a', 2, [('dd', 'ee'), ('ff', 'gg', 'hh')]), ('b', 2, [('aa',), ('cc',)]), ('c', 1, [('bb',)])]