Question

我的数据框如下：

import pandas as pd
inp = [{'c1':10,'c2':100,'c3':100}, {'c1':10,'c2':100,'c3':110}, {'c1':10,'c2':100,'c3':120}, {'c1':11,'c2':100,'c3':100}, {'c1':11,'c2':100,'c3':110}, {'c1':11,'c2':100, 'c3':120}]
df = pd.DataFrame(inp)

这就是我的聚合方式

new_df = df.groupby(['c1', 'c2']).agg({"c3": [min,max]})

但是输出不符合我的预期。我的期望如下：

inp = [{'c1':10, 'c2':100,'c3_min':100, 'c3_max':120},  {'c1':11, 'c2':100,'c3_min':100, 'c3_max':120}]
df = pd.DataFrame(inp)

我做错了什么？我怎样才能达到我的预期输出？

Answer 1

试试：

# tell Pandas to use the vectorized functions with `'min', 'max'` 
# instead of `min` and `max`
new_df = df.groupby('c1', as_index=False)['c2'].agg(['min','max'])

或者匹配输出：

new_df = (df.groupby('c1')['c2']
            .agg(['min','max'])
            .add_prefix('c2_')
            .reset_index()
         )

Answer 2

另一种方法是保留您当前的代码并使用 pandas.MultiIndex.to_flat_index 展平索引 :

# Flatten the column index
new_df.columns = new_df.columns.to_flat_index()

# From tuples to string
new_df.rename(columns='_'.join, inplace=True)

# Reset the index
new_df.reset_index(inplace=True)

打印：

   c1  c2_min  c2_max
0  10     100     120
1  11     100     120

聚合熊猫 df 以获得最大值和最小值作为列

2 个答案: