我有以下面板数据,长格式。列日期是时间维度,并与supermkt和product一起识别观察。我想计算柱子价格的时间百分比变化(每个超级产品丢失一次观察,产品)。
cols = ['date', 'supermkt', 'product', 'price']
data = [['2012-08-01',1,1,83],
['2012-08-02',1,1,68],
['2012-08-03',1,1,94],
['2012-08-04',1,1,98],
['2012-08-05',1,1,101],
['2012-08-01',1,2,21],
['2012-08-02',1,2,6],
['2012-08-03',1,2,6],
['2012-08-04',1,2,4],
['2012-08-05',1,2,12],
['2012-08-01',2,1,78],
['2012-08-02',2,1,88],
['2012-08-03',2,1,48],
['2012-08-04',2,1,48],
['2012-08-05',2,1,48]]
d = pd.DataFrame(data, columns=cols)
supermkt的预期输出= 1,乘积= 1:
cols = ['date', 'supermkt', 'product', 'price','pct_change']
data = [[2012-08-01,1,1,83,Nan],
[2012-08-02,1,1,68,-0.18],
[2012-08-03,1,1,94,0.38],
[2012-08-04,1,1,98,1.04],
[2012-08-05,1,1,101,1.03]]
答案 0 :(得分:1)
IIUC,使用groupby
+ pct_change
:
df.assign(pct_change=df.groupby(['supermkt', 'product']).price.pct_change())
date supermkt product price pct_change
0 2012-08-01 1 1 83 NaN
1 2012-08-02 1 1 68 -0.180723
2 2012-08-03 1 1 94 0.382353
3 2012-08-04 1 1 98 0.042553
4 2012-08-05 1 1 101 0.030612
5 2012-08-01 1 2 21 NaN
6 2012-08-02 1 2 6 -0.714286
7 2012-08-03 1 2 6 0.000000
8 2012-08-04 1 2 4 -0.333333
9 2012-08-05 1 2 12 2.000000
10 2012-08-01 2 1 78 NaN
11 2012-08-02 2 1 88 0.128205
12 2012-08-03 2 1 48 -0.454545
13 2012-08-04 2 1 48 0.000000
14 2012-08-05 2 1 48 0.000000