如何展平熊猫DataFrameGroupBy

时间:2019-07-18 19:03:13

标签: python pandas dataframe pandas-groupby

我有一个grouped类型的DataFrameGroupBy对象。我想用它来汇总一些数据,如下所示:

aggregated = grouped.aggregate([np.sum, np.mean], axis=1)

这将返回DataFrame,其格式为:

aggregated[:3].to_dict()
    """
    {('VALUE1', 'sum'): {
        ('US10adam034', 'PRCP'): 701,
        ('US10adam036', 'PRCP'): 1015,
        ('US10adam036', 'SNOW'): 46},
     ('VALUE1', 'mean'): {
        ('US10adam034', 'PRCP'): 100.14285714285714,
        ('US10adam036', 'PRCP'): 145.0,
        ('US10adam036', 'SNOW'): 46.0}}
    """

打印出头部会产生以下结果:

                    VALUE1            
                       sum        mean
ID          ELEMENT                   
US10adam034 PRCP       701  100.142857
US10adam036 PRCP      1015  145.000000
            SNOW        46   46.000000
US10adam046 PRCP       790  131.666667
US10adam051 PRCP         5    0.555556
US10adam056 PRCP       540   31.764706
            SNOW        25    1.923077
            SNWD       165   15.000000

这很好。当我的样本的分组索引为(ID, ELEMENT)时,它可以轻松计算出我的样本的 sum means 。但是,我真的很想把它变成单行格式,其中 ID是唯一的,而列是ELEMENT和{{1}的组合}。我可以像这样使用(sum|mean) 几乎到达那里:

apply

我再次打印出头部:

def getNewSeries(t):
    # type(t) => Series
    element = t.name[1] # t.name is a tuple ('ID', 'ELEMENT')
    sum_index=f'{element}sum'
    mean_index=f'{element}mean'
    return pd.Series(t['VALUE1'].values, index=[sum_index, mean_index])

aggregated.apply(getNewSeries, axis=1, result_type='expand')

我希望我的最终DataFrame看起来像这样:

                       PRCPmean  PRCPsum   SNOWmean  SNOWsum  SNWDmean  ...
ID          ELEMENT                                                      
US10adam034 PRCP     100.142857    701.0        NaN      NaN       NaN   
US10adam036 PRCP     145.000000   1015.0        NaN      NaN       NaN   
            SNOW            NaN      NaN  46.000000     46.0       NaN   
US10adam046 PRCP     131.666667    790.0        NaN      NaN       NaN   
US10adam051 PRCP       0.555556      5.0        NaN      NaN       NaN   
US10adam056 PRCP      31.764706    540.0        NaN      NaN       NaN   
            SNOW            NaN      NaN   1.923077     25.0       NaN   
            SNWD            NaN      NaN        NaN      NaN      15.0   

有没有一种方法可以使用 PRCPmean PRCPsum SNOWmean SNOWsum SNWDmean ... ID US10adam034 100.142857 701.0 NaN NaN NaN US10adam036 145.000000 1015.0 46.000000 46.0 NaN US10adam046 131.666667 790.0 NaN NaN NaN US10adam051 0.555556 5.0 NaN NaN NaN US10adam056 31.764706 540.0 1.923077 25.0 15.0 applyagg将此数据聚合为单行?我也尝试过在唯一ID上创建自己的迭代器,但速度很慢。我喜欢使用transform来计算总和/平均值的简便性。

5 个答案:

答案 0 :(得分:2)

我喜欢将f-string用于列表理解。.f-string格式化需要Python 3.6 +。

df_out = df.unstack()['VALUE1']
df_out.columns = [f'{i}{j}' for i, j in df_out.columns]
df_out

输出:

             PRCPsum  SNOWsum    PRCPmean  SNOWmean
US10adam034    701.0      NaN  100.142857       NaN
US10adam036   1015.0     46.0  145.000000      46.0

答案 1 :(得分:1)

您可以这样做:

new_df = agg_df.unstack(level=1)
new_df.columns = [c+b for _,b,c in new_df.columns.values]

输出:

             PRCPsum  SNOWsum    PRCPmean  SNOWmean
US10adam034    701.0      NaN  100.142857       NaN
US10adam036   1015.0     46.0  145.000000      46.0

答案 2 :(得分:1)

IIUC

aggregated = grouped['VALUE1'].aggregate([np.sum, np.mean], axis=1)
aggregated=aggregated.unstack()
aggregated.columns=aggregated.columns.map('{0[1]}|{0[0]}'.format) 

答案 3 :(得分:0)

请根据您的需要检查reset_index是否工作

  import { HostListener } from '@angular/core';

  @HostListener("window:scroll", ["$event"])
  onWindowScroll() {
    let scrollTop = (document.documentElement.scrollTop || document.body.scrollTop);
    let clientHeight =  document.documentElement.clientHeight;
    let scrollHeight = document.documentElement.scrollHeight;
    if ((scrollTop + clientHeight) >= scrollHeight) {
       //FUNCTIONS WHEN BOTTOM
    }
    else if ((scrollTop + scrollHeight) == scrollHeight) {
       //FUNCTIONS WHEN TOP
    }
  }

答案 4 :(得分:0)

我认为您可以尝试使用unstack()将最里面的行索引移动为最里面的列索引,以重塑数据。

您还可以使用fill_value将NaN更改为0