我正在尝试在Pandas数据透视表中计算加权平均价格。
我尝试使用groupby,它可以与np.average
配合使用。但是,我无法使用pd.pivot_table
复制它。
我有一个由字典构造的DataFrame:
dict_data = {
'Contract' : ['Contract 1', 'Contract 2', 'Contract 3', 'Contract 4', 'Contract 5', 'Contract 6', 'Contract 7', 'Contract 8', 'Contract 9', 'Contract 10', 'Contract 11', 'Contract 12'],
'Contract_Date': ['01/01/2019', '02/02/2019', '03/03/2019', '04/03/2019', '01/01/2019', '02/02/2019', '03/03/2019', '04/03/2019', '01/01/2019', '02/02/2019', '03/03/2019', '04/03/2019'],
'Product': ['A','A','A','A','B','B','B','B', 'C','C','C','C'],
'Delivery' : ['2019-01', '2019-01', '2019-02', '2019-03', '2019-01', '2019-01', '2019-02', '2019-03', '2019-01', '2019-01', '2019-02', '2019-03'],
'Price' : [90, 95, 100, 105, 90, 95, 100, 105, 90, 95, 100, 105],
'Balance': [50, 100, 150, 200, 50, 100, 150, 200, 50, 100, 150, 200]
}
df = pd.DataFrame.from_dict(dict_data)
df
Contract Contract_Date Product Delivery Price Balance
0 Contract 1 01/01/2019 A 2019-01 90 50
1 Contract 2 02/02/2019 A 2019-01 95 100
2 Contract 3 03/03/2019 A 2019-02 100 150
3 Contract 4 04/03/2019 A 2019-03 105 200
4 Contract 5 01/01/2019 B 2019-01 90 50
5 Contract 6 02/02/2019 B 2019-01 95 100
6 Contract 7 03/03/2019 B 2019-02 100 150
7 Contract 8 04/03/2019 B 2019-03 105 200
8 Contract 9 01/01/2019 C ` 2019-01 90 50
9 Contract 10 02/02/2019 C 2019-01 95 100
10 Contract 11 03/03/2019 C 2019-02 100 150
11 Contract 12 04/03/2019 C 2019-03 105 200
使用groupby进行加权平均计算:
df.groupby(['Product', 'Delivery']).apply(lambda x: np.average(x.Price, weights=x.Balance))
输出:
Product Delivery
A 2019-01 93.333333
2019-02 100.000000
2019-03 105.000000
B 2019-01 93.333333
2019-02 100.000000
2019-03 105.000000
C 2019-01 93.333333
2019-02 100.000000
2019-03 105.000000
尝试并陷入以下困境:
# Define a dictionary with the functions to apply for a given column:
f = {'Balance': ['sum'], 'Price': [np.average(df.Price, weights=df.Balance)] }
# Construct a pivot table, applying the weighted average price function to 'Price'
df.pivot_table(
columns='Delivery',
values=['Balance', 'Price'],
index='Product',
aggfunc=f
).swaplevel(1,0,axis=1).sort_index(axis=1)
在共享列Balance
下的预期输出(显示2个值Price
和Delivery
):
Delivery 2019-01 2019-02 2019-03
Balance Price Balance Price Balance Price
Product
A 150 93.333 150 100 200 105
B 150 93.333 150 100 200 105
C 150 93.333 150 100 200 105
答案 0 :(得分:1)
我认为您可以修复代码
df.groupby(['Product', 'Delivery']).\
apply(lambda x: pd.Series([np.average(x.Price, weights=x.Balance),x.Balance.sum()],index=['Price','Balance'])).unstack()
Out[21]:
Price Balance
Delivery 2019-01 2019-02 2019-03 2019-01 2019-02 2019-03
Product
A 93.333333 100.0 105.0 150.0 150.0 200.0
B 93.333333 100.0 105.0 150.0 150.0 200.0
C 93.333333 100.0 105.0 150.0 150.0 200.0