我有以下代码,其中一点是透视从Oracle数据库中检索的SQL表:
s = "SELECT Country || '_' || Product || '_' || Flow Ref, " + \
"Country, Product, Flow, zm, Qty " + \
"FROM Volumes "
#Following will simply pull from db into a dataframe
df = fb.QueryDB(s)
#Put ZM as column headers
df = df.pivot(values = 'QTY', index = 'REF', columns = 'ZM')
#Format the column headers
df.columns = [x.strftime('%b-%Y') for x in df.columns]
一切都很好,我得到了一个数据框,如:
Mar-2017 Apr-2017
Ref
A_B_C 100 110
D_E_F 500 210
G_H_I 310 150
除了现在我想要创建一个多索引,如下所示:
Mar-2017 Apr-2017
Ref Country Product Flow
A_B_C A B C 100 110
D_E_F D E F 500 210
G_H_I G H I 310 150
为此,我编辑了将数据框转动到的行:
df = df.pivot(values = 'QTY', index = ['REF','COUNTRY','PRODUCT','FLOW'], columns = 'ZM')
这会产生以下错误
ValueError:错过的项目数量为1859796,展示位置意味着4
非常感谢您的帮助。
答案 0 :(得分:1)
data = {'REF' : ['A_B_C','D_E_F','G_H_I','A_B_C','D_E_F','G_H_I'],
'COUNTRY' : list('ADGADG'),
'PRODUCT' : list('BEHBEH'),
'FLOW' : list('CFICFI'),
'QTY':[100,500,310,110,210,150],
'ZM':pd.to_datetime(['2017-03-01'] * 3 + ['2017-04-01'] * 3 )}
df = pd.DataFrame(data)
print (df)
COUNTRY FLOW PRODUCT QTY REF ZM
0 A C B 100 A_B_C 2017-03-01
1 D F E 500 D_E_F 2017-03-01
2 G I H 310 G_H_I 2017-03-01
3 A C B 110 A_B_C 2017-04-01
4 D F E 210 D_E_F 2017-04-01
5 G I H 150 G_H_I 2017-04-01
df = df.set_index(['REF','COUNTRY','PRODUCT','FLOW', 'ZM'])['QTY']
.unstack()
.rename_axis(None, axis=1)
df.columns = df.columns.strftime('%b-%Y')
print (df)
Mar-2017 Apr-2017
REF COUNTRY PRODUCT FLOW
A_B_C A B C 100 110
D_E_F D E F 500 210
G_H_I G H I 310 150
如果它返回错误:
ValueError:索引包含重复的条目,无法重塑
需要pivot_table
一些聚合函数,如果重复,则应用:
data = {'REF' : ['A_B_C','A_B_C','G_H_I','A_B_C','D_E_F','G_H_I'],
'COUNTRY' : list('AAGADG'),
'PRODUCT' : list('BBHBEH'),
'FLOW' : list('CCICFI'),
'QTY':[100,500,310,110,210,150],
'ZM':pd.to_datetime(['2017-03-01'] * 3 + ['2017-04-01'] * 3 )}
df = pd.DataFrame(data)
print (df)
COUNTRY FLOW PRODUCT QTY REF ZM
0 A C B 100 A_B_C 2017-03-01 <-dupe COUNTRY,FLOW,PRODUCT,QTY,REF
1 A C B 500 A_B_C 2017-03-01 <-dupe COUNTRY,FLOW,PRODUCT,QTY,REF
2 G I H 310 G_H_I 2017-03-01
3 A C B 110 A_B_C 2017-04-01
4 D F E 210 D_E_F 2017-04-01
5 G I H 150 G_H_I 2017-04-01
df = df.pivot_table(values = 'QTY',
index = ['REF','COUNTRY','PRODUCT','FLOW'],
columns = 'ZM',
aggfunc='mean')
df.columns = df.columns.strftime('%b-%Y')
print (df)
Mar-2017 Apr-2017
REF COUNTRY PRODUCT FLOW
A_B_C A B C 300.0 110.0
D_E_F D E F NaN 210.0
G_H_I G H I 310.0 150.0
或groupby
+ aggregate function
+ unstack
:
df = df.groupby(['REF','COUNTRY','PRODUCT','FLOW', 'ZM'])['QTY'].mean().unstack()
df.columns = df.columns.strftime('%b-%Y')
print (df)
Mar-2017 Apr-2017
REF COUNTRY PRODUCT FLOW
A_B_C A B C 300.0 110.0
D_E_F D E F NaN 210.0
G_H_I G H I 310.0 150.0