根据索引对Pandas DataFrame行进行分组

时间:2019-07-15 06:43:39

标签: python pandas dataframe

我有一个Pandas DataFrame,我试图根据列值对行进行分组,然后将一些行合并到列表中。请允许我详细说明:

我拥有的DataFrame看起来像这样:

industry     index     entities
cars         0         ['Norway', 'it']
cars         0         ['Mercedes', 'they']
cars         0         ['it', 'EV', 'its']
nature       1         ['fox', 'it']
nature       1         ['them', 'rabbits']
nature       2         ['whale', 'it']

所需的DataFrame应该如下所示:

industry     index     entities
cars         0         [ ['Norway', 'it'], ['Mercedes', 'they'], ['it', 'EV', 'its'] ]
nature       1         [ ['fox', 'it'], ['them', 'rabbits'] ]
nature       2         ['whale', 'it']

我基本上是在尝试根据行业和索引对行进行分组,同时将列entities的值合并到列表中。

我已经尝试过

df.groupby('industry')['index'].apply(list)

但是他们给我的结果完全不同。

我怎么能完成我想要的?谢谢。

2 个答案:

答案 0 :(得分:5)

您需要在index之后将entities更改为groupby,以处理列entities并按list分组-['industry','index']在groupby语句中:

df = df.groupby(['industry', 'index'])['entities'].apply(list).reset_index()
print (df)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]

如果需要最后一个值不在嵌套列表中,因为if-else可以使用lambda函数在每个组中只有一个值:

df1 = (df.groupby(['industry', 'index'])['entities']
         .apply(lambda x: x.tolist() if len(x) != 1 else x.iat[0])
         .reset_index())
print (df1)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                      [whale, it]

编辑:

如果在列entities中只是列表的字符串表示形式,则可以在上述解决方案之前通过ast模块将值转换为列表:

print (type(df['entities'].iat[0]))
<class 'str'>

import ast
df['entities'] = df['entities'].apply(ast.literal_eval)

print (type(df['entities'].iat[0]))
<class 'list'>

答案 1 :(得分:4)

假设entities中的元素为list

df.groupby(['industry', 'index'])['entities'].apply(lambda x: [l for l in x]).reset_index()

输出:

  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]