我有一个数据框,我按两列('Call', 'month')
分组以生成(编辑敏感信息):
我使用的代码(从我们的SQL数据库中获取相关行之后)是:
a01=[]
for row in rows:
a01.append({'GrantRefNumber':row[0],'Call': row [1],'FirstReceivedDate':row[2],'TotalGrantValue':row[3]})
df = pd.DataFrame(a01)
new_df01 = df[['Call','FirstReceivedDate','TotalGrantValue']]
new_df01['month'] = pd.Categorical(new_df01['FirstReceivedDate'].dt.strftime('%b'),
categories=vals, ordered=True)
groupA01 = new_df01.groupby(['month','Call']).agg({'TotalGrantValue':sum, 'FirstReceivedDate':'count'}).rename(columns={'FirstReceivedDate':'Count'})
groupA01['TotalGrantValue'] = groupA01['TotalGrantValue'].map('{:,.2f}'.format)
groupA01
我想要做的就是让'Call'成为行,并且每个'Count'和'TotalGrantValue'的月份都会超过一个月。像:
有人可以帮忙吗?
答案 0 :(得分:3)
您需要unstack
进行重塑,然后MultiIndex
位于df = gA.unstack(0).swaplevel(0,1,1).sort_index(1)
列中,最后按swaplevel
排序:
#sample data
rng = pd.date_range('2017-04-03', periods=20, freq='20d')
aDF = pd.DataFrame({'FirstReceivedDate': rng, 'TotalGrantValue': range(20),
'Call':list('aaaaabbbbbcccccddddd')})
#print (aDF)
rgbDF = aDF[['FirstReceivedDate','TotalGrantValue', 'Call']].copy()
vals = ['Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar']
rgbDF['month'] = pd.Categorical(rgbDF['FirstReceivedDate'].dt.strftime('%b'),
categories=vals, ordered=True)
gA = rgbDF.groupby(['month','Call']) \
.agg({'TotalGrantValue':'sum', 'FirstReceivedDate':'count'}) \
.rename(columns={'FirstReceivedDate':'Count'})
gA['TotalGrantValue'] = gA['TotalGrantValue'].map('{:,.2f}'.format)
样品:
print (gA)
TotalGrantValue Count
month Call
Apr a 1.00 2
d 19.00 1
May a 2.00 1
Jun a 7.00 2
Jul b 5.00 1
Aug b 13.00 2
Sep b 17.00 2
Oct c 10.00 1
Nov c 23.00 2
Dec c 13.00 1
Jan c 14.00 1
d 15.00 1
Feb d 16.00 1
Mar d 35.00 2
df = gA.unstack(0).swaplevel(0,1,1).sort_index(1)
print (df)
month Apr May Jun Jul \
Count TotalGrantValue Count TotalGrantValue Count TotalGrantValue Count
Call
a 2.0 1.00 1.0 2.00 2.0 7.00 NaN
b NaN None NaN None NaN None 1.0
c NaN None NaN None NaN None NaN
d 1.0 19.00 NaN None NaN None NaN
month Aug ... Nov \
TotalGrantValue Count TotalGrantValue ... Count
Call ...
a None NaN None ... NaN
b 5.00 2.0 13.00 ... NaN
c None NaN None ... 2.0
d None NaN None ... NaN
month Dec Jan Feb \
TotalGrantValue Count TotalGrantValue Count TotalGrantValue Count
Call
a None NaN None NaN None NaN
b None NaN None NaN None NaN
c 23.00 1.0 13.00 1.0 14.00 NaN
d None NaN None 1.0 15.00 1.0
month Mar
TotalGrantValue Count TotalGrantValue
Call
a None NaN None
b None NaN None
c None NaN None
d 16.00 2.0 35.00
[4 rows x 24 columns]
SearchDescriptor<object> SearchAgg = new SearchDescriptor<object>();
for (i=0;i < aggList.length;i++)
{
SearchAgg.Aggregations(a => a.terms (aggList[i]), t=> t.Field(aggList[i]));
}
答案 1 :(得分:1)
您可以使用new_df01
功能
pd.pivot_table()
尝试
data_p = pd.pivot_table(new_df01, values=['TotalGrantValue'], index=['Call'], columns=['month'], aggfunc=('count', 'mean'))