熊猫:摘要数据框中的多个总计

时间:2016-06-11 19:10:29

标签: numpy pandas dataframe plyr aggregation

在我尝试学习Python时,为noob问题道歉。期待着加快速度并回馈

假设我有以下数据,

YEAR         SECTOR    PROFIT   STARTMVYEAR TOTALPROFIT STARTMV
IBM         TECHNOLOGY  -500    2500        500         1500
APPLE       TECHNOLOGY   800    4000        300         4500
GM          INDUSTRIAL   250    1000          0         1250
CHRYSLER    INDUSTRIAL   600    3000        100         3500

我想创建一个如下所示的摘要

SECTOR      PROFITYEAR  TOTALPROFIT
TECHNOLOGY     .046       .133
INDUSTRIAL     .213       .021

每个小组的位置都有sum(PROFIT)/sum(STARTMVYEAR)sum(TOTALPROFIT)/sum(STARTMV)

如果我想仅为第一个基准测试而做,我可以做到

by_profit_totals =(df.groupby(['SECTOR'])['PROFIT'].sum()/by_first_count.groupby(['SECTOR'])['STARTMVYEAR'].sum())

但我如何为两者做到这一点?此外,是否有我可以使用的简单功能,例如,利润和startmvyear并返回汇总值?

1 个答案:

答案 0 :(得分:1)

您可以使用groupby汇总cython optimized sum,然后div values创建https://plnkr.co/edit/9rfHtE0PHXPhC5Kcyb7P

g = df.groupby('SECTOR').sum()
print (g[['PROFIT','TOTALPROFIT']].div( g[['STARTMVYEAR','STARTMV']].values).reset_index())
       SECTOR    PROFIT  TOTALPROFIT
0  INDUSTRIAL  0.212500     0.021053
1  TECHNOLOGY  0.046154     0.133333