将np.average应用于熊猫枢轴aggfunc

时间:2019-07-02 04:07:44

标签: pandas

我正在尝试使用pandas数据透视表计算加权平均价格。

我尝试使用aggfunc传递字典。

尽管它应该计算正确的加权平均值,但是当传递到aggfunc时,此方法不起作用。

'Price': lambda x: np.average(x, weights=df['Balance'])

我也尝试过使用手动分组依据:

df.groupby('Product').agg({
    'Balance': sum,
    'Price': lambda x : np.average(x, weights='Balance'), 
    'Value': sum
})

这也会产生错误: TypeError: Axis must be specified when shapes of a and weights differ.

这是示例数据

import pandas as pd
import numpy as np

price_dict = {'Product': {0: 'A',
  1: 'A',
  2: 'A',
  3: 'A',
  4: 'A',
  5: 'B',
  6: 'B',
  7: 'B',
  8: 'B',
  9: 'B',
  10: 'C',
  11: 'C',
  12: 'C',
  13: 'C',
  14: 'C'},
 'Balance': {0: 10,
  1: 20,
  2: 30,
  3: 40,
  4: 50,
  5: 60,
  6: 70,
  7: 80,
  8: 90,
  9: 100,
  10: 110,
  11: 120,
  12: 130,
  13: 140,
  14: 150},
 'Price': {0: 1,
  1: 2,
  2: 3,
  3: 4,
  4: 5,
  5: 6,
  6: 7,
  7: 8,
  8: 9,
  9: 10,
  10: 11,
  11: 12,
  12: 13,
  13: 14,
  14: 15},
 'Value': {0: 10,
  1: 40,
  2: 90,
  3: 160,
  4: 250,
  5: 360,
  6: 490,
  7: 640,
  8: 810,
  9: 1000,
  10: 1210,
  11: 1440,
  12: 1690,
  13: 1960,
  14: 2250}}

尝试通过将dict传递到aggfunc中来计算加权平均值:

df = pd.DataFrame(price_dict)

df.pivot_table(
    index='Product',
    aggfunc = {
        'Balance': sum,
        'Price': np.mean,
        'Value': sum
    }
)

输出:

    Balance     Price   Value
Product             
A   150     3   550
B   400     8   3300
C   650     13  8550

预期结果应为:

    Balance     Price   Value
Product             
A   150     3.66    550
B   400     8.25    3300
C   650     13.15   8550

1 个答案:

答案 0 :(得分:2)

这是使用apply

的一种方法
df.groupby('Product').apply(lambda x : pd.Series(
    {'Balance': x['Balance'].sum(),
    'Price': np.average(x['Price'], weights=x['Balance']), 
    'Value': x['Value'].sum()}))
Out[57]: 
         Balance      Price   Value
Product                            
A          150.0   3.666667   550.0
B          400.0   8.250000  3300.0
C          650.0  13.153846  8550.0