选择特定日期并以熊猫为单位计算值的pct_change

时间:2019-12-20 01:49:35

标签: python pandas dataframe

对于以下数据框中的每个组citydistrict,我想使用price的{​​{1}}值作为基本值,计算2019-03和{与2019-06中的值相比,{1}}个月的2019-12值百分比变化。

price

我怎么能得到像这样的预期结果?谢谢。

2019-03

只需尝试 city district date price 0 a c 2019-01 9.99 1 a c 2019-02 10.66 2 a c 2019-03 10.56 3 a c 2019-04 10.06 4 a c 2019-05 10.69 5 a c 2019-06 10.77 6 a c 2019-07 10.67 7 a c 2019-08 10.51 8 a c 2019-09 10.28 9 a c 2019-10 10.05 10 a c 2019-11 9.72 11 a c 2019-12 9.98 12 b d 2019-01 6.32 13 b d 2019-02 6.32 14 b d 2019-03 6.32 15 b d 2019-04 6.32 16 b d 2019-05 6.32 17 b d 2019-06 6.00 18 b d 2019-07 6.00 19 b d 2019-08 6.00 20 b d 2019-09 6.00 21 b d 2019-10 6.00 22 b d 2019-11 6.00 23 b d 2019-12 5.65 ,显然我并不需要。

当前代码的输出:

   city district     date  price       pct
0     a        c  2019-01   9.99       NaN
1     a        c  2019-02  10.66       NaN
2     a        c  2019-03  10.56       NaN
3     a        c  2019-04  10.06       NaN
4     a        c  2019-05  10.69       NaN
5     a        c  2019-06  10.77  0.019886
6     a        c  2019-07  10.67       NaN
7     a        c  2019-08  10.51       NaN
8     a        c  2019-09  10.28       NaN
9     a        c  2019-10  10.05       NaN
10    a        c  2019-11   9.72       NaN
11    a        c  2019-12   9.98 -0.054924
12    b        d  2019-01   6.32       NaN
13    b        d  2019-02   6.32       NaN
14    b        d  2019-03   6.32       NaN
15    b        d  2019-04   6.32       NaN
16    b        d  2019-05   6.32       NaN
17    b        d  2019-06   6.00 -0.050633
18    b        d  2019-07   6.00       NaN
19    b        d  2019-08   6.00       NaN
20    b        d  2019-09   6.00       NaN
21    b        d  2019-10   6.00       NaN
22    b        d  2019-11   6.00       NaN
23    b        d  2019-12   5.65 -0.106013

2 个答案:

答案 0 :(得分:2)

您可以使用isin而不使用groupby,并且将第一个值除以transform

m = df["date"].isin(['2019-01', '2019-06', '2019-12'])
s = df[m].groupby(["city","district"])['price'].transform('first')

df.loc[m, 'pct1'] = df.loc[m, 'price'].div(s).sub(1)
print (df)
   city district     date  price      pct1
0     a        c  2019-01   9.99  0.000000
1     a        c  2019-02  10.66       NaN
2     a        c  2019-03  10.56       NaN
3     a        c  2019-04  10.06       NaN
4     a        c  2019-05  10.69       NaN
5     a        c  2019-06  10.77  0.078078
6     a        c  2019-07  10.67       NaN
7     a        c  2019-08  10.51       NaN
8     a        c  2019-09  10.28       NaN
9     a        c  2019-10  10.05       NaN
10    a        c  2019-11   9.72       NaN
11    a        c  2019-12   9.98 -0.001001
12    b        d  2019-01   6.32  0.000000
13    b        d  2019-02   6.32       NaN
14    b        d  2019-03   6.32       NaN
15    b        d  2019-04   6.32       NaN
16    b        d  2019-05   6.32       NaN
17    b        d  2019-06   6.00 -0.050633
18    b        d  2019-07   6.00       NaN
19    b        d  2019-08   6.00       NaN
20    b        d  2019-09   6.00       NaN
21    b        d  2019-10   6.00       NaN
22    b        d  2019-11   6.00       NaN
23    b        d  2019-12   5.65 -0.106013

答案 1 :(得分:1)

首先计算所有百分比,然后将nan设置为不需要的月份:

df["pct"] = df.groupby(["city","district"])['price'].apply(lambda x: x/x.iat[0]-1)
df.loc[~df["date"].isin(['2019-06', '2019-12']),"pct"] = np.NaN

print (df)

   city district     date  price       pct
0     a        c  2019-01   9.99       NaN
1     a        c  2019-02  10.66       NaN
2     a        c  2019-03  10.56       NaN
3     a        c  2019-04  10.06       NaN
4     a        c  2019-05  10.69       NaN
5     a        c  2019-06  10.77  0.078078
6     a        c  2019-07  10.67       NaN
7     a        c  2019-08  10.51       NaN
8     a        c  2019-09  10.28       NaN
9     a        c  2019-10  10.05       NaN
10    a        c  2019-11   9.72       NaN
11    a        c  2019-12   9.98 -0.001001
12    b        d  2019-01   6.32       NaN
13    b        d  2019-02   6.32       NaN
14    b        d  2019-03   6.32       NaN
15    b        d  2019-04   6.32       NaN
16    b        d  2019-05   6.32       NaN
17    b        d  2019-06   6.00 -0.050633
18    b        d  2019-07   6.00       NaN
19    b        d  2019-08   6.00       NaN
20    b        d  2019-09   6.00       NaN
21    b        d  2019-10   6.00       NaN
22    b        d  2019-11   6.00       NaN
23    b        d  2019-12   5.65 -0.106013

或者使用1,6,12个月创建一个蒙版,然后计算百分比范围:

df["pct"] = (df[df.groupby(["city","district"])["date"]
               .apply(lambda x: x.isin(['2019-01', '2019-06', '2019-12']))]
               .groupby(["city","district"])['price'].apply(lambda x: x/x.iat[0]-1))

print (df)

   city district     date  price       pct
0     a        c  2019-01   9.99  0.000000
1     a        c  2019-02  10.66       NaN
2     a        c  2019-03  10.56       NaN
3     a        c  2019-04  10.06       NaN
4     a        c  2019-05  10.69       NaN
5     a        c  2019-06  10.77  0.078078
6     a        c  2019-07  10.67       NaN
7     a        c  2019-08  10.51       NaN
8     a        c  2019-09  10.28       NaN
9     a        c  2019-10  10.05       NaN
10    a        c  2019-11   9.72       NaN
11    a        c  2019-12   9.98 -0.001001
12    b        d  2019-01   6.32  0.000000
13    b        d  2019-02   6.32       NaN
14    b        d  2019-03   6.32       NaN
15    b        d  2019-04   6.32       NaN
16    b        d  2019-05   6.32       NaN
17    b        d  2019-06   6.00 -0.050633
18    b        d  2019-07   6.00       NaN
19    b        d  2019-08   6.00       NaN
20    b        d  2019-09   6.00       NaN
21    b        d  2019-10   6.00       NaN
22    b        d  2019-11   6.00       NaN
23    b        d  2019-12   5.65 -0.106013