尝试汇总熊猫数据框,并根据原始df的groupby结果计算“总计百分比”列。
原始df:
Shape_Area LU
0 91254232.781776 Fallow Cropland
1 522096.071094 Mixed Wetland Hardwoods
2 87795.467187 Mixed Wetland Hardwoods
3 440.528367 Mixed Wetland Hardwoods
4 778952.154436 Dikes and Levees
分组结果:
Shape_Area
LU
Dikes and Levees 778952.154436
Fallow Cropland 91254232.781776
Mixed Wetland Hardwoods 610332.066649
我想为每种LU类型添加一个额外的“总计PCT”列。我不确定我是否正确访问了groupby结果,可能不了解它是什么(一系列?)。
df = pd.DataFrame(narr, columns=['LU','Shape_Area'])
df = df.groupby(['LU'])[['Shape_Area']].sum()
#to print the example above after groupby
print df
答案 0 :(得分:1)
您可以简单地计算Shape_Area
系列的总和(返回标量),然后将分组数据框中的Shape_Area
的每一行除以该值。
grouped = df.groupby(['LU'])[['Shape_Area']].sum()
grouped['pct'] = grouped['Shape_Area'] / grouped['Shape_Area'].sum()
Shape_Area pct
LU
Dikes and Levees 7.789522e+05 0.008408
Fallow Cropland 9.125423e+07 0.985004
Mixed Wetland Hardwoods 6.103321e+05 0.006588