删除组中总计百分比低于阈值的行

时间:2020-08-31 04:18:22

标签: python pandas

在此数据框中:

        City Province  Sales
0    Toronto       ON     13
1   Montreal       QC      6
2  Vancouver       BC     16
3    Calgary       AL      8
4   Edmonton       AL      4
5   Winnipeg       MN      3
6    Windsor       ON      1

我想删除某省的销售百分比小于总百分比的15%的行。例如在这种情况下,结果数据框将为:

        City Province  Sales
0    Toronto       ON     13
1  Vancouver       BC     16
2    Calgary       AL      8
3   Edmonton       AL      4
4    Windsor       ON      1

3 个答案:

答案 0 :(得分:2)

GroupBy.transformsum一起使用, 除以Series.div总数,最后过滤器除以boolean indexing

df = df[df.groupby('Province')['Sales'].transform('sum').div(df['Sales'].sum()) > 0.15]
print (df)
        City Province  Sales
0    Toronto       ON     13
2  Vancouver       BC     16
3    Calgary       AL      8
4   Edmonton       AL      4
6    Windsor       ON      1

答案 1 :(得分:1)

tot=df.Sales.sum()#Find sum of column
df[df.groupby(['City','Province'])['Sales'].transform(lambda x: (x.div(tot)*100)<15)]#calculate percentage filter as per condition

答案 2 :(得分:0)

不确定这是最直接的路线,但这会给出输出,如OP的问题所示:

df = pd.DataFrame({
    'City': ['Toronto', 'Montreal', 'Vancouver', 'Calgary', 'Edmonton', 'Winnipeg', 'Windsor'],
    'Province': ['ON', 'QC', 'BC', 'AB', 'AB', 'MN', 'ON'],
    'Sales': [13,6,16,8,4,3,1]
})

prov_pct = df.groupby('Province')['Sales'].sum() / df['Sales'].sum()
prov_keep = prov_pct[prov_pct > 0.15].index
df[df['Province'].isin(prov_keep)]

Output:

    City   Province Sales
0   Toronto     ON  13
2   Vancouver   BC  16
3   Calgary     AB  8
4   Edmonton    AB  4
6   Windsor     ON  1

@wwnde的答案输出为:

df[df.groupby('Province')['Sales'].transform(lambda x: (x / tot*100) > 15)]  #calculate percentage filter as per condition

    City    Province    Sales
0   Toronto       ON    13
2   Vancouver     BC    16
3   Calgary       AB    8