Question

我有一个Pandas DataFrame，我试图根据列值对行进行分组，然后将一些行合并到列表中。请允许我详细说明：

我拥有的DataFrame看起来像这样：

industry     index     entities
cars         0         ['Norway', 'it']
cars         0         ['Mercedes', 'they']
cars         0         ['it', 'EV', 'its']
nature       1         ['fox', 'it']
nature       1         ['them', 'rabbits']
nature       2         ['whale', 'it']

所需的DataFrame应该如下所示：

industry     index     entities
cars         0         [ ['Norway', 'it'], ['Mercedes', 'they'], ['it', 'EV', 'its'] ]
nature       1         [ ['fox', 'it'], ['them', 'rabbits'] ]
nature       2         ['whale', 'it']

我基本上是在尝试根据行业和索引对行进行分组，同时将列entities的值合并到列表中。

我已经尝试过

df.groupby('industry')['index'].apply(list)

但是他们给我的结果完全不同。

我怎么能完成我想要的？谢谢。

Answer 1

您需要在index之后将entities更改为groupby，以处理列entities并按list分组-['industry'，'index']在groupby语句中：

df = df.groupby(['industry', 'index'])['entities'].apply(list).reset_index()
print (df)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]

如果需要最后一个值不在嵌套列表中，因为if-else可以使用lambda函数在每个组中只有一个值：

df1 = (df.groupby(['industry', 'index'])['entities']
         .apply(lambda x: x.tolist() if len(x) != 1 else x.iat[0])
         .reset_index())
print (df1)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                      [whale, it]

编辑：

如果在列entities中只是列表的字符串表示形式，则可以在上述解决方案之前通过ast模块将值转换为列表：

print (type(df['entities'].iat[0]))
<class 'str'>

import ast
df['entities'] = df['entities'].apply(ast.literal_eval)

print (type(df['entities'].iat[0]))
<class 'list'>

Answer 2

假设entities中的元素为list：

df.groupby(['industry', 'index'])['entities'].apply(lambda x: [l for l in x]).reset_index()

输出：

  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]

根据索引对Pandas DataFrame行进行分组

2 个答案: