我有以下数据框(df)(所有列都包含列表,除了类型,包含字符串)
Type Components names
Zebra [hand,arm,nose] [bubu,kuku]
Zebra [eyes,fingers] [gaga,timber]
Zebra [paws] []
Lion [teeth] [scar]
Tiger [fingers] [figgy]
我想根据Type对它们进行分组,因此输出如下:
Type Components Names
Zebra [hand,arm,nose,eyes,fingers,paws] [bubu,kuku,gaga,timber]
Lion [teeth] [scar]
Tiger [fingers] [figgy]
我尝试过这样的事情:
df.groupby('role')
我最终也没有成功使用.agg。
答案 0 :(得分:1)
选项1
groupby
+ sum
未优化,不考虑重复
df.groupby('Type', sort=False, as_index=False).sum()
Type Components names
0 Zebra [hand, arm, nose, eyes, fingers, paws] [bubu, kuku, gaga, timber]
1 Lion [teeth] [scar]
2 Tiger [fingers] [figgy]
选项2
groupby
+ agg
+ itertools.chain
帐户重复,并且非常有效地展平
from itertools import chain
df.groupby('Type', sort=False, as_index=False).agg(
lambda x: list(set(chain.from_iterable(x)))
)
Type Components names
0 Zebra [eyes, hand, paws, arm, fingers, nose] [timber, bubu, gaga, kuku]
1 Lion [teeth] [scar]
2 Tiger [fingers] [figgy]