Pandas操纵数据帧

时间:2016-08-14 20:50:35

标签: python datetime pandas data-manipulation

我正在查询数据库并填充pandas数据帧。我正在努力聚合数据(通过groupby),然后操纵数据框索引,使表中的日期成为索引。 下面是一个示例,说明数据在groupby之前和之后的样子以及我最终要查找的内容。

dataframe - 填充数据

firm |    dates    | received | Sent
-----------------------------------------
A       10/08/2016      2         8
A       12/08/2016      4         2
B       10/08/2016      1         0
B       11/08/2016      3         5
A       13/08/2016      5         1
C       14/08/2016      7         3 
B       14/08/2016      2         5
  1. 首先,我希望Group By" firm"和"约会"和"收到/发送"。

  2. 然后操纵DataFrame,使日期成为索引 - 而不是行索引。

  3. 最后为每天添加一个总列

  4. 有些公司没有'活动'在某些日子里或者至少没有接收或发送的活动。但是,由于我想要查看过去X天的情况,因此无法使用空值,而是需要将零填充为值。

  5. dates        | 10/08/2016 | 11/08/2016| 12/08/2016| 13/08/2016| 14/08/2016    
    firm  |  
    ----------------------------------------------------------------------
    A      received     2           0            4            5          0
           sent         8           0            2            1          0
    
    B      received     1           3            1            0          2
           sent         0           5            0            0          5
    
    C      received     0           0            2            0          1
           sent         0           0            1            2          0
    
    Totals r.           3           3            7            5          3             
    Totals s.           8           0            3            3          5
    

    我尝试过以下代码:

    df = > mysql query result
    
    n_received = df.groupby(["firm", "dates"
                                    ]).received.size()
    
    n_sent = df.groupby(["firm", "dates"
                                    ]).sent.size()
    
    tables = pd.DataFrame({ 'received': n_received, 'sent': n_sent,
                               }, 
                                columns=['received','sent'])
    
    this = pd.melt(tables, 
                        id_vars=['dates', 
                                 'firm',
                                 'received', 'sent']
    
    this = this.set_index(['dates', 
                             'firm',
                             'received', 'sent'
                        'var'
                        ])        
    this = this.unstack('dates').fillna(0)     
    
    this.columns = this.columns.droplevel()
    
    this.columns.name = ''
    
    this = this.transpose()
    

    基本上,我没有根据此代码获得我想要的结果。 - 我怎样才能做到这一点? - 从概念上讲,是否有更好的方法来实现这一结果?比如说在SQL语句中进行聚合,或者从优化的角度来看,Pandas中的聚合是否更有意义。逻辑上。

1 个答案:

答案 0 :(得分:0)

您可以使用stackunstack)将数据转换为从长到宽(从长到长)格式:

import pandas as pd
# calculate the total received and sent grouped by dates
df1 = df.drop('firm', axis = 1).groupby('dates').sum().reset_index()

# add total category as the firm column
df1['firm'] = 'total'

# concatenate the summary data frame and original data frame use stack and unstack to 
# transform the data frame so that dates appear as columns while received and sent stack as column.
pd.concat([df, df1]).set_index(['firm', 'dates']).stack().unstack(level = 1).fillna(0)

# dates         10/08/2016  11/08/2016  12/08/2016  13/08/2016  14/08/2016
#  firm                     
#     A     Sent       8.0         0.0         2.0         1.0         0.0
#       received       2.0         0.0         4.0         5.0         0.0
#     B     Sent       0.0         5.0         0.0         0.0         5.0
#       received       1.0         3.0         0.0         0.0         2.0
#     C     Sent       0.0         0.0         0.0         0.0         3.0
#       received       0.0         0.0         0.0         0.0         7.0
# total     Sent       8.0         5.0         2.0         1.0         8.0
#       received       3.0         3.0         4.0         5.0         9.0