在此数据框中:
City Province Sales
0 Toronto ON 13
1 Montreal QC 6
2 Vancouver BC 16
3 Calgary AL 8
4 Edmonton AL 4
5 Winnipeg MN 3
6 Windsor ON 1
我想删除某省的销售百分比小于总百分比的15%的行。例如在这种情况下,结果数据框将为:
City Province Sales
0 Toronto ON 13
1 Vancouver BC 16
2 Calgary AL 8
3 Edmonton AL 4
4 Windsor ON 1
答案 0 :(得分:2)
将GroupBy.transform
与sum
一起使用,
除以Series.div
总数,最后过滤器除以boolean indexing
:
df = df[df.groupby('Province')['Sales'].transform('sum').div(df['Sales'].sum()) > 0.15]
print (df)
City Province Sales
0 Toronto ON 13
2 Vancouver BC 16
3 Calgary AL 8
4 Edmonton AL 4
6 Windsor ON 1
答案 1 :(得分:1)
tot=df.Sales.sum()#Find sum of column
df[df.groupby(['City','Province'])['Sales'].transform(lambda x: (x.div(tot)*100)<15)]#calculate percentage filter as per condition
答案 2 :(得分:0)
不确定这是最直接的路线,但这会给出输出,如OP的问题所示:
df = pd.DataFrame({
'City': ['Toronto', 'Montreal', 'Vancouver', 'Calgary', 'Edmonton', 'Winnipeg', 'Windsor'],
'Province': ['ON', 'QC', 'BC', 'AB', 'AB', 'MN', 'ON'],
'Sales': [13,6,16,8,4,3,1]
})
prov_pct = df.groupby('Province')['Sales'].sum() / df['Sales'].sum()
prov_keep = prov_pct[prov_pct > 0.15].index
df[df['Province'].isin(prov_keep)]
Output:
City Province Sales
0 Toronto ON 13
2 Vancouver BC 16
3 Calgary AB 8
4 Edmonton AB 4
6 Windsor ON 1
@wwnde的答案输出为:
df[df.groupby('Province')['Sales'].transform(lambda x: (x / tot*100) > 15)] #calculate percentage filter as per condition
City Province Sales
0 Toronto ON 13
2 Vancouver BC 16
3 Calgary AB 8