Question

我有一个带有2个数字列的df

DATA_ROWS = 5
df = pd.DataFrame({"id":[1]*DATA_ROWS,"x":[1,2,3,4,5],
                        "z":[1,1,1,5,6]})
df.set_index("id", drop=True, append=True, inplace=True)

          x  z
     id      
   0   1  1  1
   1   1  2  1
   2   1  3  1
   3   1  4  5
   4   1  5  6

（id是一个索引）

另外，我有一个功能列表

funcs = [np.max, np.min, np.std, func1, func2]

所以，当我聚合时，我得到了

df.aggregate(funcs)

                                                           x         z
amax                                                5.000000  6.000000
amin                                                1.000000  1.000000
std                                                 1.581139  2.489980
func1                                               7.000000  1.000000
func2                                              23.500000  6.200000

我想取而代之的是以下

   x_amax, x_amin  x_std  x_func1 x_func2 z_amax z_amin z_std z_func1 z_func2
1  5.000   1.000 1.5811    7.000  23.500  6.000  1.000 2.4899 1.000  6.2000

我阅读了关于pivot，melt等的文档，我无法知道如何做到这一点，有什么想法吗？

Answer 1

使用unstack进行重塑，to_frame用于一列df，然后按T进行转置。最后通过columns和MultiIndex展开map来创建join：

#select first value of level id
id1 = df_fdw.index.get_level_values('id')[0]

df = df.unstack().to_frame(id1).T
df.columns = df.columns.map('_'.join)
print (df)
   x_amax  x_amin     x_std  x_func1  x_func2  z_amax  z_amin    z_std  \
1     5.0     1.0  1.581139      3.0     15.0     6.0     1.0  2.48998   

   z_func1  z_func2  
1      2.8     14.0

multiple id的解决方案（也适用于唯一的id）：

df = pd.DataFrame({"id":[1]*2 + [2]*3,"x":[1,2,3,4,5],
                        "z":[1,1,1,5,6]})
df.set_index("id", drop=True, append=True, inplace=True)

#sample functions
def func1(x):
    return x.mean()

def func2(x):
    return x.sum()


funcs = [np.max, np.min, np.std, func1, func2]

df = df.groupby(level='id').aggregate(funcs)
df.columns = df.columns.map('_'.join)
print (df)
    x_amax  x_amin     x_std  x_func1  x_func2  z_amax  z_amin     z_std  \
id                                                                         
1        2       1  0.707107      1.5        3       1       1  0.000000   
2        5       3  1.000000      4.0       12       6       1  2.645751   

    z_func1  z_func2  
id                    
1         1        2  
2         4       12

Answer 2

我认为如果你想要每groupby行一次，你应该在这里使用id。

df1 = df.groupby('id').agg([np.max, np.min, np.std, 'first', 'last'])
df1.columns =['_'.join(c) for c in df1.columns.values]

df1

    x_amax  x_amin     x_std  x_first  x_last  z_amax  z_amin    z_std  \
id                                                                       
1        5       1  1.581139        1       5       6       1  2.48998   

    z_first  z_last  
id                   
1         1       6

请注意，您可以将所有基本聚合函数（mean / max / min / std / etc）的字符串名称传递给agg，因此这也适用：

aggfuncs = ['max', 'min', 'std', func1, func2]
df1 = df.groupby('id').agg(aggfuncs)

重塑pd.DataFrame.aggregate的结果

2 个答案: