我有一个看起来像这样的数据集
Location Type Number
House A 4
B 1
Garden A 3
B 2
我正在尝试找到一种在每个位置创建B类型比例列的方法。
预期的输出-
Location Type Number Proportion_B
House A 4 20%
B 1 20%
Garden A 3 40%
B 2 40%
我该如何实现?
答案 0 :(得分:0)
使用:
#create MultiIndex
df1 = df.set_index(['Location','Type'])
#if necessary aggregate sum per both levels
#df1 = df1.sum(level=[0,1])
#select B level and divide by sum
df2 = df1.xs('B', level=1).div(df1.sum(level=0), level=1).mul(100).add_prefix('prop_B_')
print (df2)
prop_B_Number
Location
House 20.0
Garden 40.0
#join to original DataFrame
df = df.join(df2, on='Location')
print (df)
Location Type Number prop_B_Number
0 House A 4 20.0
1 House B 1 20.0
2 Garden A 3 40.0
3 Garden B 2 40.0
答案 1 :(得分:0)
我尝试过这种方式
temp= df.groupby('Location').apply(lambda x: ((x[x['Type']=='B']['Number']/x['Number'].sum())*100)).reset_index().rename(columns={'Number':'Proportion_B'})
temp=temp[['Location','Proportion_B']]
temp['Proportion_B']=temp['Proportion_B'].astype(str).str.replace('\.0','')+'%'
df=pd.merge(df,temp,how='left',on=['Location'])
输出:
Location Type Number Proportion_B
0 House A 4 20%
1 House B 1 20%
2 Garden A 3 40%
3 Garden B 2 40%
说明:
Location
分组,然后将总数除以B并保存此结果。Location' using
left`合并将临时结果与原始df合并。注意: 第2行,第3行用于获得相同的样本输出。
答案 2 :(得分:0)
也许是这样
df_temp = df.groupby('Location').apply(lambda x: ((x[x['Type']=='B']['Number']/x['Number'].sum())*100)).reset_index().rename(columns={'Number':'Proportion_B'})
df=pd.merge(df,df_temp,how='left',on=['Location'])