在包含一列

时间:2018-04-19 15:07:49

标签: python pandas dataframe pandas-groupby

我有以下数据框(df)(所有列都包含列表,除了类型,包含字符串)

Type    Components        names
Zebra  [hand,arm,nose]   [bubu,kuku]
Zebra   [eyes,fingers]   [gaga,timber]
Zebra   [paws]           []
Lion    [teeth]          [scar]
Tiger   [fingers]        [figgy]

我想根据Type对它们进行分组,因此输出如下:

Type    Components                           Names
Zebra   [hand,arm,nose,eyes,fingers,paws]    [bubu,kuku,gaga,timber]
Lion    [teeth]                              [scar]
Tiger   [fingers]                            [figgy]

我尝试过这样的事情:

df.groupby('role')

我最终也没有成功使用.agg。

1 个答案:

答案 0 :(得分:1)

选项1
groupby + sum
未优化,不考虑重复

df.groupby('Type', sort=False, as_index=False).sum()

    Type                              Components                       names
0  Zebra  [hand, arm, nose, eyes, fingers, paws]  [bubu, kuku, gaga, timber]
1   Lion                                 [teeth]                      [scar]
2  Tiger                               [fingers]                     [figgy]

选项2
groupby + agg + itertools.chain
帐户重复,并且非常有效地展平

from itertools import chain
df.groupby('Type', sort=False, as_index=False).agg(
    lambda x: list(set(chain.from_iterable(x)))
)

    Type                              Components                       names
0  Zebra  [eyes, hand, paws, arm, fingers, nose]  [timber, bubu, gaga, kuku]
1   Lion                                 [teeth]                      [scar]
2  Tiger                               [fingers]                     [figgy]