熊猫groupby计算差异

时间:2020-03-12 14:35:16

标签: pandas pandas-groupby

import pandas as pd
data = [['2017-09-30','A',123],['2017-12-31','A',23],['2017-09-30','B',74892],['2017-12-31','B',52222],['2018-09-30','A',37599],['2018-12-31','A',66226]]

df = pd.DataFrame.from_records(data,columns=["Date", "Company", "Revenue YTD"])
df['Date'] = pd.to_datetime(df['Date'])

df = df.groupby(['Company',df['Date'].dt.year]).diff()
print(df)


     Date  Revenue YTD
0     NaT          NaN
1 92 days       -100.0
2     NaT          NaN
3 92 days     -22670.0
4     NaT          NaN
5 92 days      28627.0

我想计算公司在9月和12月之间的收入差异。我已经尝试了groupby公司和年份。但是结果却不是我所期望的

期望结果

     Date         Company   Revenue YTD
0    2017            A           -100
1    2018            A         -22670
2    2017            B          28627

2 个答案:

答案 0 :(得分:1)

IIUC,这应该起作用

(df.assign(Date=df['Date'].dt.year,
           Revenue_Diff=df.groupby(['Company',df['Date'].dt.year])['Revenue YTD'].diff())
   .drop('Revenue YTD', axis=1)
   .dropna()
)

输出:

   Date Company  Revenue_Diff
1  2017       A        -100.0
3  2017       B      -22670.0
5  2018       A       28627.0

答案 1 :(得分:0)

尝试一下:

设置:

import pandas as pd
import numpy as np

data = [['2017-09-30','A',123],['2017-12-31','A',23],['2017-09-30','B',74892],['2017-12-31','B',52222],['2018-09-30','A',37599],['2018-12-31','A',66226]]

df = pd.DataFrame.from_records(data,columns=["Date", "Company", "Revenue YTD"])
df['Date'] = pd.to_datetime(df['Date'])

使用np.diff()更新:

my_func = lambda x: np.diff(x)

df = (df.groupby([df.Date.dt.year, df.Company])
         .agg({'Revenue YTD':my_func}))

print(df)

              Revenue YTD
Date Company
2017 A               -100
     B             -22670
2018 A              28627

希望这会有所帮助。