如何使用多个分类变量对seaborn计数图进行标准化

时间:2018-05-09 10:59:30

标签: python pandas matplotlib seaborn

我为数据框的多个分类变量创建了一个seaborn countplot但是我想要计算百分比而不是计数?

最佳选择是什么? Barplots?我可以使用类似下面的查询来立即获得条形图吗?

for i, col in enumerate(df_categorical.columns):
   plt.figure(i)
   sns.countplot(x=col,hue='Response',data=df_categorical) 

此查询一次性为我提供countplot所有变量

谢谢!

数据如下所示:

    State           Response     Coverage   Education   Effective To Date   EmploymentStatus       Gender   Location Code   Marital Status  Policy Type Policy    Renew Offer Type  Sales Channel   Vehicle Class   Vehicle Size    
0   Washington  No  Basic   Bachelor    2/24/11 Employed    F   Suburban    Married Corporate Auto  Corporate L3    Offer1  Agent   Two-Door Car    Medsize  
1   Arizona     No  Extended    Bachelor    1/31/11 Unemployed  F   Suburban    Single  Personal Auto   Personal L3 Offer3  Agent   Four-Door Car   Medsize
2   Nevada      No  Premium Bachelor    2/19/11 Employed    F   Suburban    Married Personal Auto   Personal L3 Offer1  Agent   Two-Door Car    Medsize
3   California  No  Basic   Bachelor    1/20/11 Unemployed  M   Suburban    Married Corporate Auto  Corporate L2    Offer1  Call Center SUV Medsize
4   Washington  No  Basic   Bachelor    2/3/11  Employed    M   Rural   Single  Personal Auto   Personal L1 Offer1  Agent   Four-Door Car   Medsize

1 个答案:

答案 0 :(得分:0)

考虑groupby.transform计算百分比列,然后使用 x 运行barplot作为原始值列, y 运行百分比列。

数据 (仅对原始发布数据转换为两个“否”为“是”)

from io import StringIO
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

txt = '''
    State           Response     Coverage   Education   "Effective To Date"   EmploymentStatus       Gender   "Location Code"   "Marital Status"  "Policy Type" Policy    "Renew Offer Type"  "Sales Channel"   "Vehicle Class"   "Vehicle Size" 
0   Washington  No  Basic   Bachelor    "2/24/11" Employed    F   Suburban    Married "Corporate Auto"  "Corporate L3"    Offer1  Agent   "Two-Door Car"    Medsize  
1   Arizona     No  Extended    Bachelor  "1/31/11"   Unemployed  F   Suburban    Single  "Personal Auto"   "Personal L3" Offer3  Agent   "Four-Door Car"   Medsize
2   Nevada      Yes  Premium Bachelor    "2/19/11" Employed    F   Suburban    Married "Personal Auto"   "Personal L3" Offer1  Agent   "Two-Door Car"    Medsize
3   California  No  Basic   Bachelor    "1/20/11" Unemployed  M   Suburban    Married "Corporate Auto"  "Corporate L2"    Offer1  "Call Center" SUV Medsize
4   Washington  Yes  Basic   Bachelor    "2/3/11"  Employed    M   Rural   Single  "Personal Auto"   "Personal L1" Offer1  Agent   "Four-Door Car"   Medsize'''

df_categorical = pd.read_table(StringIO(txt), sep="\s+")

绘图 (两列多个图的单个数字)

fig = plt.figure(figsize=(10,30))

for i, col in enumerate(df_categorical.columns):   
   # PERCENT COLUMN CALCULATION
   df_categorical[col+'_pct'] = df_categorical.groupby(['Response', col])[col]\
                                   .transform(lambda x: len(x)) / len(df_categorical)

   plt.subplot(8, 2, i+1)   
   sns.barplot(x=col, y=col+'_pct', hue='Response', data=df_categorical)\
          .set(xlabel=col, ylabel='Percent')    

plt.tight_layout()
plt.show()
plt.clf()

plt.close('all')

Plot Output