熊猫数据透视表中的运行总和(Python)

时间:2020-09-09 21:58:33

标签: python python-3.x pandas pivot-table

我有一个看起来像这样的数据框:

df = pd.DataFrame({'publisher': ['facebook', 'facebook', 'facebook', 'google', 'google', 'google'],
          'month_leadgen': ['2019-01', '2019-01', '2019-01', '2019-02', '2019-02', '2019-03'],
         'month_payment': ['2019-01', '2019-02', '2019-03', '2019-02', '2019-03', '2019-03'],
         'revenue': [60, 25, 45, 85, 90, 60]})

我创建了数据透视表:

df = df.pivot_table(index=['publisher', 'month_leadgen'], columns=['month_payment'], values=['revenue']).reset_index()

    publisher   month_leadgen   revenue
month_payment            2019-01  2019-02  2019-03
0   facebook    2019-01  60.0     25.0     45.0
1   google      2019-02  NaN      85.0     90.0
2   google      2019-03  NaN      NaN      60.0

我的预期输出将是按月汇总运行总计。所以对于facebook,我想在2019-02栏中看到85.0(第1个月+第2个月)。 Facebook的2019-03栏为125.0(第1个月+第2个月+第3个月)。谢谢。

2 个答案:

答案 0 :(得分:0)

让我们尝试在轴= 1上使用.cumsum()。那就是cumsum(1)

df=df.join(df[['revenue']].cumsum(1).rename(columns=dict(revenue='Cumsum')))



publisher month_leadgen                     revenue                  Cumsum          \
month_payment                         2019-01 2019-02 2019-03   2019-01 2019-02   
0              facebook       2019-01    60.0    25.0    45.0    60.0    85.0   
1                google       2019-02     NaN    85.0    90.0     NaN    85.0   
2                google       2019-03     NaN     NaN    60.0     NaN     NaN   

                       
month_payment 2019-03  
0               130.0  
1               175.0  
2                60.0  

或者,在枢纽阶段进行;

df2 = df=df.pivot_table(index=['month_leadgen','publisher'], columns=['month_payment'], values=['revenue']).cumsum(axis=1).reset_index()

答案 1 :(得分:0)

只需以发布者作为索引来定义数据集:

df = pd.DataFrame( { 'month_leadgen':['2019-01','2019-01','2019-01','2019-02','2019-02','2019-03'], 'month_payment':['2019-01','2019-02','2019-03','2019-02','2019-03','2019-03'], “收入”:[60、25、45、85、90、60] },索引= ['facebook','facebook','facebook','google','google','google'] )

然后执行,执行以下行:

df ['TotalShifted'] = df.groupby(level = 0)['revenue']。transform(lambda x:x.cumsum()。shift(0))

您将获得:

month_leadgen month_pay收入TotalShifted facebook 2019-01 2019-01 60 60 facebook 2019-01 2019-02 25 85 脸谱网2019-01 2019-03 45130 谷歌2019-02 2019-02 85 85 谷歌2019-02 2019-03 90175 谷歌2019-03 2019-03 60 235

enter image description here