我的数据框如下所示,具有特定ID(代码),区域和长度按特定距离(Dist_km)
code Dist_km Shape_Leng Shape_Area
0 M0017 5.0 57516.601608 5.076465e+07
1 M0017 10.0 94037.663673 4.638184e+07
2 M0017 15.0 39106.310470 1.426327e+07
3 M0017 20.0 138.038115 6.464380e+02
4 M0017 30.0 12158.395200 4.102351e+06
5 M0073 5.0 51922.847698 3.375080e+07
6 M0073 10.0 75543.660382 5.966612e+07
7 M0073 15.0 55277.027428 3.423961e+07
8 M0073 20.0 26945.782055 2.584022e+07
9 M0073 25.0 4052.670711 6.904536e+05
10 M0333 5.0 30090.687597 5.468791e+07
11 M0333 10.0 55946.815385 5.768929e+07
12 M0333 15.0 65026.329732 4.008600e+07
13 M0333 20.0 59014.487216 2.994337e+07
14 M0333 25.0 17423.635441 6.358991e+06
使用:
mrb['cum_area_sqm'] = mrb.groupby(['code'])['Shape_Area'].apply(lambda x: x.cumsum())
mrb['cum_area_ha'] = mrb['cum_area_sqm']/10000
mrb_cumsum = mrb.groupby(['code','Dist_km']).agg({'cum_area_ha': 'sum'})
我设法将数据框转换为下面的
cum_area_ha
code Dist_km
M0017 5.0 5076.464548
10.0 9714.648238
15.0 11140.974881
20.0 11141.039525
30.0 11551.274623
M0073 5.0 3375.080465
10.0 9341.692680
15.0 12765.654064
20.0 15349.676332
25.0 15418.721691
M0333 5.0 5468.790981
10.0 11237.720454
15.0 15246.320869
20.0 18240.658255
25.0 18876.557351
但是,我现在希望每code
个Dist_km
的累计百分比达到 cum_area_ha cum_area_pc
code Dist_km
M0017 5.0 5076.464548 43.49
10.0 9714.648238 84.10
15.0 11140.974881 96.45
20.0 11141.039525 96.45
30.0 11551.274623 100.00
,最高可达100%。
因此,例如对于M0017,我希望有类似下面的内容。
{{1}}
答案 0 :(得分:1)
您可以将每个元素除以同一代码组中的最后一个cum_area_ha。
mrb_cumsum.div(mrb_cumsum.groupby(level=0).last())
Out[97]:
cum_area_ha
code Dist_km
M0017 5.0 0.439472
10.0 0.841002
15.0 0.964480
20.0 0.964486
30.0 1.000000
M0073 5.0 0.218895
10.0 0.605867
15.0 0.827932
20.0 0.995522
25.0 1.000000
M0333 5.0 0.289713
10.0 0.595327
15.0 0.807685
20.0 0.966313
25.0 1.000000