我有一个Pandas数据框,其中有两列-
Vendor Product Category
VendorA ProdABC B
VendorA ProdXYZ C
VendorAB ProdCDC A
VendorAB ProdDEF A
VendorAB ProdKLM B
VendorF ProdXYZ D
VendorC ProdBSE C
VendorF ProdFGH D
VendorAB ProdMNO D
VendorA ProdFGH D
VendorV ProdCDC A
VendorF ProdBSE C
我需要-
我该怎么做?
答案 0 :(得分:1)
我将使用您在问题中显示的数据框
对于第一个任务,请使用groupby.size
top10=df.groupby('Vendor').size().sort_values(ascending=False).head(10)
print(top10)
Vendor
VendorAB 3
VendorA 2
VendorF 1
dtype: int64
在通过 top10 的vendors
创建分组(使用:DataFrame.isin)和categories
之后,使用DataFrame.unstack用{{3} }:
top10_by_categories=df[df['Vendor'].isin(top10vendors)].groupby(['Vendor','Category']).count()['Product'].unstack()
categories=top10_by_categories.columns
top10_by_categories['total']=top10_by_categories.sum(axis=1)
top10_by_categories.sort_values(by='total',ascending=False,inplace=True)
print(top10_by_categories)
top10_by_categories[categories].plot(kind='bar',stacked=True)
Category A B C D total
Vendor
VendorAB 2.0 1.0 NaN NaN 3.0
VendorA NaN 1.0 1.0 NaN 2.0
VendorF NaN NaN NaN 1.0 1.0