对于以下数据框中的每个组city
和district
,我想使用price
的{{1}}值作为基本值,计算2019-03
和{与2019-06
中的值相比,{1}}个月的2019-12
值百分比变化。
price
我怎么能得到像这样的预期结果?谢谢。
2019-03
只需尝试 city district date price
0 a c 2019-01 9.99
1 a c 2019-02 10.66
2 a c 2019-03 10.56
3 a c 2019-04 10.06
4 a c 2019-05 10.69
5 a c 2019-06 10.77
6 a c 2019-07 10.67
7 a c 2019-08 10.51
8 a c 2019-09 10.28
9 a c 2019-10 10.05
10 a c 2019-11 9.72
11 a c 2019-12 9.98
12 b d 2019-01 6.32
13 b d 2019-02 6.32
14 b d 2019-03 6.32
15 b d 2019-04 6.32
16 b d 2019-05 6.32
17 b d 2019-06 6.00
18 b d 2019-07 6.00
19 b d 2019-08 6.00
20 b d 2019-09 6.00
21 b d 2019-10 6.00
22 b d 2019-11 6.00
23 b d 2019-12 5.65
,显然我并不需要。
当前代码的输出:
city district date price pct
0 a c 2019-01 9.99 NaN
1 a c 2019-02 10.66 NaN
2 a c 2019-03 10.56 NaN
3 a c 2019-04 10.06 NaN
4 a c 2019-05 10.69 NaN
5 a c 2019-06 10.77 0.019886
6 a c 2019-07 10.67 NaN
7 a c 2019-08 10.51 NaN
8 a c 2019-09 10.28 NaN
9 a c 2019-10 10.05 NaN
10 a c 2019-11 9.72 NaN
11 a c 2019-12 9.98 -0.054924
12 b d 2019-01 6.32 NaN
13 b d 2019-02 6.32 NaN
14 b d 2019-03 6.32 NaN
15 b d 2019-04 6.32 NaN
16 b d 2019-05 6.32 NaN
17 b d 2019-06 6.00 -0.050633
18 b d 2019-07 6.00 NaN
19 b d 2019-08 6.00 NaN
20 b d 2019-09 6.00 NaN
21 b d 2019-10 6.00 NaN
22 b d 2019-11 6.00 NaN
23 b d 2019-12 5.65 -0.106013
答案 0 :(得分:2)
您可以使用isin
而不使用groupby
,并且将第一个值除以transform
:
m = df["date"].isin(['2019-01', '2019-06', '2019-12'])
s = df[m].groupby(["city","district"])['price'].transform('first')
df.loc[m, 'pct1'] = df.loc[m, 'price'].div(s).sub(1)
print (df)
city district date price pct1
0 a c 2019-01 9.99 0.000000
1 a c 2019-02 10.66 NaN
2 a c 2019-03 10.56 NaN
3 a c 2019-04 10.06 NaN
4 a c 2019-05 10.69 NaN
5 a c 2019-06 10.77 0.078078
6 a c 2019-07 10.67 NaN
7 a c 2019-08 10.51 NaN
8 a c 2019-09 10.28 NaN
9 a c 2019-10 10.05 NaN
10 a c 2019-11 9.72 NaN
11 a c 2019-12 9.98 -0.001001
12 b d 2019-01 6.32 0.000000
13 b d 2019-02 6.32 NaN
14 b d 2019-03 6.32 NaN
15 b d 2019-04 6.32 NaN
16 b d 2019-05 6.32 NaN
17 b d 2019-06 6.00 -0.050633
18 b d 2019-07 6.00 NaN
19 b d 2019-08 6.00 NaN
20 b d 2019-09 6.00 NaN
21 b d 2019-10 6.00 NaN
22 b d 2019-11 6.00 NaN
23 b d 2019-12 5.65 -0.106013
答案 1 :(得分:1)
首先计算所有百分比,然后将nan
设置为不需要的月份:
df["pct"] = df.groupby(["city","district"])['price'].apply(lambda x: x/x.iat[0]-1)
df.loc[~df["date"].isin(['2019-06', '2019-12']),"pct"] = np.NaN
print (df)
city district date price pct
0 a c 2019-01 9.99 NaN
1 a c 2019-02 10.66 NaN
2 a c 2019-03 10.56 NaN
3 a c 2019-04 10.06 NaN
4 a c 2019-05 10.69 NaN
5 a c 2019-06 10.77 0.078078
6 a c 2019-07 10.67 NaN
7 a c 2019-08 10.51 NaN
8 a c 2019-09 10.28 NaN
9 a c 2019-10 10.05 NaN
10 a c 2019-11 9.72 NaN
11 a c 2019-12 9.98 -0.001001
12 b d 2019-01 6.32 NaN
13 b d 2019-02 6.32 NaN
14 b d 2019-03 6.32 NaN
15 b d 2019-04 6.32 NaN
16 b d 2019-05 6.32 NaN
17 b d 2019-06 6.00 -0.050633
18 b d 2019-07 6.00 NaN
19 b d 2019-08 6.00 NaN
20 b d 2019-09 6.00 NaN
21 b d 2019-10 6.00 NaN
22 b d 2019-11 6.00 NaN
23 b d 2019-12 5.65 -0.106013
或者使用1,6,12个月创建一个蒙版,然后计算百分比范围:
df["pct"] = (df[df.groupby(["city","district"])["date"]
.apply(lambda x: x.isin(['2019-01', '2019-06', '2019-12']))]
.groupby(["city","district"])['price'].apply(lambda x: x/x.iat[0]-1))
print (df)
city district date price pct
0 a c 2019-01 9.99 0.000000
1 a c 2019-02 10.66 NaN
2 a c 2019-03 10.56 NaN
3 a c 2019-04 10.06 NaN
4 a c 2019-05 10.69 NaN
5 a c 2019-06 10.77 0.078078
6 a c 2019-07 10.67 NaN
7 a c 2019-08 10.51 NaN
8 a c 2019-09 10.28 NaN
9 a c 2019-10 10.05 NaN
10 a c 2019-11 9.72 NaN
11 a c 2019-12 9.98 -0.001001
12 b d 2019-01 6.32 0.000000
13 b d 2019-02 6.32 NaN
14 b d 2019-03 6.32 NaN
15 b d 2019-04 6.32 NaN
16 b d 2019-05 6.32 NaN
17 b d 2019-06 6.00 -0.050633
18 b d 2019-07 6.00 NaN
19 b d 2019-08 6.00 NaN
20 b d 2019-09 6.00 NaN
21 b d 2019-10 6.00 NaN
22 b d 2019-11 6.00 NaN
23 b d 2019-12 5.65 -0.106013