加权平均大熊猫

时间:2017-11-08 13:32:29

标签: python pandas

我有一个带有林分标识,树种,高度和体积的数据框:

import pandas as pd

df=pd.DataFrame.from_items([('STAND_ID',[1,1,2,3,3,3]),('Species',['Conifer','Broadleaves','Conifer','Broadleaves','Conifer','Conifer']),
                             ('Height',[20,19,13,24,25,18]),('Volume',[200,100,300,50,100,10])])

   STAND_ID      Species  Height  Volume
0         1      Conifer      20     200
1         1  Broadleaves      19     100
2         2      Conifer      13     300
3         3  Broadleaves      24      50
4         3      Conifer      25     100
5         3      Conifer      18      10

我想通过stand id进行groupby和unstack并计算体积加权平均高度,所以我尝试:

newdf=df.groupby(['STAND_ID','Species']).mean().unstack()

          Height              Volume        
Species  Broadleaves Conifer Broadleaves Conifer
STAND_ID                                        
1               19.0    20.0       100.0   200.0
2                NaN    13.0         NaN   300.0
3               24.0    21.5        50.0    55.0

高度当然不是音量加权的意思。我怎么体积重量?对于STAND_ID 3和Conifer来说就像这样:

  

(25 * 100 + 18 * 10)/(100 + 10)= 24.4

2 个答案:

答案 0 :(得分:4)

如果lambda函数混淆,则apply也可以与函数定义一起使用。 (并且还有一个函数numpy.average来计算加权平均值)

import numpy as np
def weighted_average(group):
    weights = group['Volume']
    height = group['Height']

    return np.average(height,weights=weights)

df.groupby(['STAND_ID','Species']).apply(func = weighted_average).unstack()

答案 1 :(得分:2)

如果我理解正确,一种方法是使用groupby执行apply

df
   STAND_ID      Species  Height  Volume
0         1      Conifer      20     200
1         1  Broadleaves      19     100
2         2      Conifer      13     300
3         3  Broadleaves      24      50
4         3      Conifer      25     100
5         3      Conifer      18      10

df.groupby(['STAND_ID','Species']).apply(lambda x: (x['Height'] * x['Volume'].div(x['Volume'].sum())).sum()).unstack()

Species   Broadleaves    Conifer
STAND_ID                        
1                19.0  20.000000
2                 NaN  13.000000
3                24.0  24.363636