我正在查询数据库并填充pandas数据帧。我正在努力聚合数据(通过groupby),然后操纵数据框索引,使表中的日期成为索引。 下面是一个示例,说明数据在groupby之前和之后的样子以及我最终要查找的内容。
dataframe - 填充数据
firm | dates | received | Sent
-----------------------------------------
A 10/08/2016 2 8
A 12/08/2016 4 2
B 10/08/2016 1 0
B 11/08/2016 3 5
A 13/08/2016 5 1
C 14/08/2016 7 3
B 14/08/2016 2 5
首先,我希望Group By" firm"和"约会"和"收到/发送"。
然后操纵DataFrame,使日期成为索引 - 而不是行索引。
最后为每天添加一个总列
有些公司没有'活动'在某些日子里或者至少没有接收或发送的活动。但是,由于我想要查看过去X天的情况,因此无法使用空值,而是需要将零填充为值。
dates | 10/08/2016 | 11/08/2016| 12/08/2016| 13/08/2016| 14/08/2016 firm | ---------------------------------------------------------------------- A received 2 0 4 5 0 sent 8 0 2 1 0 B received 1 3 1 0 2 sent 0 5 0 0 5 C received 0 0 2 0 1 sent 0 0 1 2 0 Totals r. 3 3 7 5 3 Totals s. 8 0 3 3 5
我尝试过以下代码:
df = > mysql query result
n_received = df.groupby(["firm", "dates"
]).received.size()
n_sent = df.groupby(["firm", "dates"
]).sent.size()
tables = pd.DataFrame({ 'received': n_received, 'sent': n_sent,
},
columns=['received','sent'])
this = pd.melt(tables,
id_vars=['dates',
'firm',
'received', 'sent']
this = this.set_index(['dates',
'firm',
'received', 'sent'
'var'
])
this = this.unstack('dates').fillna(0)
this.columns = this.columns.droplevel()
this.columns.name = ''
this = this.transpose()
基本上,我没有根据此代码获得我想要的结果。 - 我怎样才能做到这一点? - 从概念上讲,是否有更好的方法来实现这一结果?比如说在SQL语句中进行聚合,或者从优化的角度来看,Pandas中的聚合是否更有意义。逻辑上。
答案 0 :(得分:0)
您可以使用stack
(unstack
)将数据转换为从长到宽(从长到长)格式:
import pandas as pd
# calculate the total received and sent grouped by dates
df1 = df.drop('firm', axis = 1).groupby('dates').sum().reset_index()
# add total category as the firm column
df1['firm'] = 'total'
# concatenate the summary data frame and original data frame use stack and unstack to
# transform the data frame so that dates appear as columns while received and sent stack as column.
pd.concat([df, df1]).set_index(['firm', 'dates']).stack().unstack(level = 1).fillna(0)
# dates 10/08/2016 11/08/2016 12/08/2016 13/08/2016 14/08/2016
# firm
# A Sent 8.0 0.0 2.0 1.0 0.0
# received 2.0 0.0 4.0 5.0 0.0
# B Sent 0.0 5.0 0.0 0.0 5.0
# received 1.0 3.0 0.0 0.0 2.0
# C Sent 0.0 0.0 0.0 0.0 3.0
# received 0.0 0.0 0.0 0.0 7.0
# total Sent 8.0 5.0 2.0 1.0 8.0
# received 3.0 3.0 4.0 5.0 9.0