Python Pandas - 在不同方向分组两列

时间:2017-07-20 10:56:26

标签: python pandas

我有一个数据框,我按两列('Call', 'month')分组以生成(编辑敏感信息):

enter image description here

我使用的代码(从我们的SQL数据库中获取相关行之后)是:

a01=[]

for row in rows:
    a01.append({'GrantRefNumber':row[0],'Call': row [1],'FirstReceivedDate':row[2],'TotalGrantValue':row[3]})

df = pd.DataFrame(a01)
new_df01 = df[['Call','FirstReceivedDate','TotalGrantValue']]
new_df01['month'] = pd.Categorical(new_df01['FirstReceivedDate'].dt.strftime('%b'), 
                                 categories=vals, ordered=True) 


groupA01 = new_df01.groupby(['month','Call']).agg({'TotalGrantValue':sum, 'FirstReceivedDate':'count'}).rename(columns={'FirstReceivedDate':'Count'})
groupA01['TotalGrantValue'] = groupA01['TotalGrantValue'].map('{:,.2f}'.format)
groupA01

我想要做的就是让'Call'成为行,并且每个'Count'和'TotalGrantValue'的月份都会超过一个月。像:

enter image description here

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:3)

您需要unstack进行重塑,然后MultiIndex位于df = gA.unstack(0).swaplevel(0,1,1).sort_index(1) 列中,最后按swaplevel排序:

#sample data
rng = pd.date_range('2017-04-03', periods=20, freq='20d')
aDF = pd.DataFrame({'FirstReceivedDate': rng, 'TotalGrantValue': range(20),
                    'Call':list('aaaaabbbbbcccccddddd')})  
#print (aDF)

rgbDF = aDF[['FirstReceivedDate','TotalGrantValue', 'Call']].copy()

vals = ['Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Jan', 'Feb', 'Mar']
rgbDF['month'] = pd.Categorical(rgbDF['FirstReceivedDate'].dt.strftime('%b'), 
                                 categories=vals, ordered=True) 

gA = rgbDF.groupby(['month','Call']) \
          .agg({'TotalGrantValue':'sum', 'FirstReceivedDate':'count'}) \
          .rename(columns={'FirstReceivedDate':'Count'})
gA['TotalGrantValue'] = gA['TotalGrantValue'].map('{:,.2f}'.format)

样品:

print (gA)
           TotalGrantValue  Count
month Call                       
Apr   a               1.00      2
      d              19.00      1
May   a               2.00      1
Jun   a               7.00      2
Jul   b               5.00      1
Aug   b              13.00      2
Sep   b              17.00      2
Oct   c              10.00      1
Nov   c              23.00      2
Dec   c              13.00      1
Jan   c              14.00      1
      d              15.00      1
Feb   d              16.00      1
Mar   d              35.00      2
df = gA.unstack(0).swaplevel(0,1,1).sort_index(1)
print (df)
month   Apr                   May                   Jun                   Jul  \
      Count TotalGrantValue Count TotalGrantValue Count TotalGrantValue Count   
Call                                                                            
a       2.0            1.00   1.0            2.00   2.0            7.00   NaN   
b       NaN            None   NaN            None   NaN            None   1.0   
c       NaN            None   NaN            None   NaN            None   NaN   
d       1.0           19.00   NaN            None   NaN            None   NaN   

month                   Aug                       ...         Nov  \
      TotalGrantValue Count TotalGrantValue       ...       Count   
Call                                              ...               
a                None   NaN            None       ...         NaN   
b                5.00   2.0           13.00       ...         NaN   
c                None   NaN            None       ...         2.0   
d                None   NaN            None       ...         NaN   

month                   Dec                   Jan                   Feb  \
      TotalGrantValue Count TotalGrantValue Count TotalGrantValue Count   
Call                                                                      
a                None   NaN            None   NaN            None   NaN   
b                None   NaN            None   NaN            None   NaN   
c               23.00   1.0           13.00   1.0           14.00   NaN   
d                None   NaN            None   1.0           15.00   1.0   

month                   Mar                  
      TotalGrantValue Count TotalGrantValue  
Call                                         
a                None   NaN            None  
b                None   NaN            None  
c                None   NaN            None  
d               16.00   2.0           35.00  

[4 rows x 24 columns]
SearchDescriptor<object> SearchAgg = new SearchDescriptor<object>();
for (i=0;i < aggList.length;i++)
{
    SearchAgg.Aggregations(a => a.terms (aggList[i]), t=> t.Field(aggList[i]));
}

答案 1 :(得分:1)

您可以使用new_df01功能

直接从pd.pivot_table()尝试
    data_p = pd.pivot_table(new_df01, values=['TotalGrantValue'], index=['Call'], columns=['month'], aggfunc=('count', 'mean'))