我在构造熊猫数据透视表时遇到了麻烦。
我希望在同一列['Balance', 'WAP']
下有两个值['Delivery']
。
这是从字典构造的DataFrame:
dict_data = {
'Contract' : ['Contract 1', 'Contract 2', 'Contract 3', 'Contract 4'],
'Contract_Date': ['01/01/2019', '02/02/2019', '03/03/2019', '04/03/2019'],
'Delivery' : ['2019-01', '2019-01', '2019-02', '2019-03'],
'Price' : [90, 95, 100, 105],
'Balance': [50, 100, 150, 200]
}
df = pd.DataFrame.from_dict(dict_data)
df
DataFrame:
Contract Contract_Date Delivery Price Balance
0 Contract 1 01/01/2019 2019-01 90 50
1 Contract 2 02/02/2019 2019-01 95 100
2 Contract 3 03/03/2019 2019-02 100 150
3 Contract 4 04/03/2019 2019-03 105 200
计算加权平均价格:
# Create WAP - Weighted Average Price
df['Value'] = df['Balance'] * df['Price']
df['WAP'] = df['Value'] / df['Balance']
df
数据透视表构造:
# Use a dictionary to apply more than 1 type of aggregate onto the data
f = {'Balance': ['sum'], 'WAP': ['mean']}
df.pivot_table(
columns='Delivery',
values=['Balance', 'WAP'],
index=['Contract_Date', 'Contract'],
aggfunc=f
).replace(np.nan, '')
我试图将2个值显示在同一列下,以便进行比较,例如下表(手动构造):
Delivery 2019-01 2019-02 2019-03
Contract Date Contract Balance WAP Balance WAP Balance WAP
01/01/2019 Contract 1 50 90
02/02/2019 Contract 2 100 95
03/03/2019 Contract 3 150 100
04/03/2019 Contract 4 200 105
是否正在考虑沿堆栈/堆栈的某个地方解决此问题?非常感谢您的帮助,因为我还不熟悉Pandas。
答案 0 :(得分:1)
首先从字典中将一个元素列表转换为字符串,以避免3级MultiIndex:
f = {'Balance': 'sum', 'WAP': 'mean'}
然后将DataFrame.swaplevel
与DataFrame.sort_index
结合使用:
f = {'Balance': 'sum', 'WAP': 'mean'}
df = (df.pivot_table(
columns='Delivery',
values=['Balance', 'WAP'],
index=['Contract_Date', 'Contract'],
aggfunc=f
).replace(np.nan, '')
.swaplevel(1,0, axis=1)
.sort_index(axis=1))
print (df)
Delivery 2019-01 2019-02 2019-03
Balance WAP Balance WAP Balance WAP
Contract_Date Contract
01/01/2019 Contract 1 50 90
02/02/2019 Contract 2 100 95
03/03/2019 Contract 3 150 100
04/03/2019 Contract 4 200 105