Question

我正在查询数据库并填充pandas数据帧。我正在努力聚合数据（通过groupby），然后操纵数据框索引，使表中的日期成为索引。下面是一个示例，说明数据在groupby之前和之后的样子以及我最终要查找的内容。

dataframe - 填充数据

firm |    dates    | received | Sent
-----------------------------------------
A       10/08/2016      2         8
A       12/08/2016      4         2
B       10/08/2016      1         0
B       11/08/2016      3         5
A       13/08/2016      5         1
C       14/08/2016      7         3 
B       14/08/2016      2         5

首先，我希望Group By＆＃34; firm＆＃34;和＆＃34;约会＆＃34;和＆＃34;收到/发送＆＃34;。
然后操纵DataFrame，使日期成为索引 - 而不是行索引。
最后为每天添加一个总列
有些公司没有＆＃39;活动＆＃39;在某些日子里或者至少没有接收或发送的活动。但是，由于我想要查看过去X天的情况，因此无法使用空值，而是需要将零填充为值。

dates        | 10/08/2016 | 11/08/2016| 12/08/2016| 13/08/2016| 14/08/2016    
firm  |  
----------------------------------------------------------------------
A      received     2           0            4            5          0
       sent         8           0            2            1          0

B      received     1           3            1            0          2
       sent         0           5            0            0          5

C      received     0           0            2            0          1
       sent         0           0            1            2          0

Totals r.           3           3            7            5          3             
Totals s.           8           0            3            3          5

我尝试过以下代码：

df = > mysql query result

n_received = df.groupby(["firm", "dates"
                                ]).received.size()

n_sent = df.groupby(["firm", "dates"
                                ]).sent.size()

tables = pd.DataFrame({ 'received': n_received, 'sent': n_sent,
                           }, 
                            columns=['received','sent'])

this = pd.melt(tables, 
                    id_vars=['dates', 
                             'firm',
                             'received', 'sent']

this = this.set_index(['dates', 
                         'firm',
                         'received', 'sent'
                    'var'
                    ])        
this = this.unstack('dates').fillna(0)     

this.columns = this.columns.droplevel()

this.columns.name = ''

this = this.transpose()

基本上，我没有根据此代码获得我想要的结果。 - 我怎样才能做到这一点？ - 从概念上讲，是否有更好的方法来实现这一结果？比如说在SQL语句中进行聚合，或者从优化的角度来看，Pandas中的聚合是否更有意义。逻辑上。

Answer 1

您可以使用stack（unstack）将数据转换为从长到宽（从长到长）格式：

import pandas as pd
# calculate the total received and sent grouped by dates
df1 = df.drop('firm', axis = 1).groupby('dates').sum().reset_index()

# add total category as the firm column
df1['firm'] = 'total'

# concatenate the summary data frame and original data frame use stack and unstack to 
# transform the data frame so that dates appear as columns while received and sent stack as column.
pd.concat([df, df1]).set_index(['firm', 'dates']).stack().unstack(level = 1).fillna(0)

# dates         10/08/2016  11/08/2016  12/08/2016  13/08/2016  14/08/2016
#  firm                     
#     A     Sent       8.0         0.0         2.0         1.0         0.0
#       received       2.0         0.0         4.0         5.0         0.0
#     B     Sent       0.0         5.0         0.0         0.0         5.0
#       received       1.0         3.0         0.0         0.0         2.0
#     C     Sent       0.0         0.0         0.0         0.0         3.0
#       received       0.0         0.0         0.0         0.0         7.0
# total     Sent       8.0         5.0         2.0         1.0         8.0
#       received       3.0         3.0         4.0         5.0         9.0

Pandas操纵数据帧

1 个答案: