使用pandas的面板数据的百分比变化

时间:2018-03-12 22:16:43

标签: python pandas dataframe

我有以下面板数据,长格式。列日期是时间维度,并与supermkt和product一起识别观察。我想计算柱子价格的时间百分比变化(每个超级产品丢失一次观察,产品)。

cols = ['date', 'supermkt', 'product', 'price']

data = [['2012-08-01',1,1,83],
['2012-08-02',1,1,68],
['2012-08-03',1,1,94],
['2012-08-04',1,1,98],
['2012-08-05',1,1,101],
['2012-08-01',1,2,21],
['2012-08-02',1,2,6],
['2012-08-03',1,2,6],
['2012-08-04',1,2,4],
['2012-08-05',1,2,12],
['2012-08-01',2,1,78],
['2012-08-02',2,1,88],
['2012-08-03',2,1,48],
['2012-08-04',2,1,48],
['2012-08-05',2,1,48]]

d = pd.DataFrame(data, columns=cols)

supermkt的预期输出= 1,乘积= 1:

cols = ['date', 'supermkt', 'product', 'price','pct_change']

data = [[2012-08-01,1,1,83,Nan],
[2012-08-02,1,1,68,-0.18],
[2012-08-03,1,1,94,0.38],
[2012-08-04,1,1,98,1.04],
[2012-08-05,1,1,101,1.03]]

1 个答案:

答案 0 :(得分:1)

IIUC,使用groupby + pct_change

df.assign(pct_change=df.groupby(['supermkt', 'product']).price.pct_change())

          date  supermkt  product  price  pct_change
0   2012-08-01         1        1     83         NaN
1   2012-08-02         1        1     68   -0.180723
2   2012-08-03         1        1     94    0.382353
3   2012-08-04         1        1     98    0.042553
4   2012-08-05         1        1    101    0.030612
5   2012-08-01         1        2     21         NaN
6   2012-08-02         1        2      6   -0.714286
7   2012-08-03         1        2      6    0.000000
8   2012-08-04         1        2      4   -0.333333
9   2012-08-05         1        2     12    2.000000
10  2012-08-01         2        1     78         NaN
11  2012-08-02         2        1     88    0.128205
12  2012-08-03         2        1     48   -0.454545
13  2012-08-04         2        1     48    0.000000
14  2012-08-05         2        1     48    0.000000