我为数据框的多个分类变量创建了一个seaborn countplot
但是我想要计算百分比而不是计数?
最佳选择是什么? Barplots?我可以使用类似下面的查询来立即获得条形图吗?
for i, col in enumerate(df_categorical.columns):
plt.figure(i)
sns.countplot(x=col,hue='Response',data=df_categorical)
此查询一次性为我提供countplot
所有变量
谢谢!
数据如下所示:
State Response Coverage Education Effective To Date EmploymentStatus Gender Location Code Marital Status Policy Type Policy Renew Offer Type Sales Channel Vehicle Class Vehicle Size
0 Washington No Basic Bachelor 2/24/11 Employed F Suburban Married Corporate Auto Corporate L3 Offer1 Agent Two-Door Car Medsize
1 Arizona No Extended Bachelor 1/31/11 Unemployed F Suburban Single Personal Auto Personal L3 Offer3 Agent Four-Door Car Medsize
2 Nevada No Premium Bachelor 2/19/11 Employed F Suburban Married Personal Auto Personal L3 Offer1 Agent Two-Door Car Medsize
3 California No Basic Bachelor 1/20/11 Unemployed M Suburban Married Corporate Auto Corporate L2 Offer1 Call Center SUV Medsize
4 Washington No Basic Bachelor 2/3/11 Employed M Rural Single Personal Auto Personal L1 Offer1 Agent Four-Door Car Medsize
答案 0 :(得分:0)
考虑groupby.transform
计算百分比列,然后使用 x 运行barplot
作为原始值列, y 运行百分比列。
数据 (仅对原始发布数据转换为两个“否”为“是”)
from io import StringIO
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
txt = '''
State Response Coverage Education "Effective To Date" EmploymentStatus Gender "Location Code" "Marital Status" "Policy Type" Policy "Renew Offer Type" "Sales Channel" "Vehicle Class" "Vehicle Size"
0 Washington No Basic Bachelor "2/24/11" Employed F Suburban Married "Corporate Auto" "Corporate L3" Offer1 Agent "Two-Door Car" Medsize
1 Arizona No Extended Bachelor "1/31/11" Unemployed F Suburban Single "Personal Auto" "Personal L3" Offer3 Agent "Four-Door Car" Medsize
2 Nevada Yes Premium Bachelor "2/19/11" Employed F Suburban Married "Personal Auto" "Personal L3" Offer1 Agent "Two-Door Car" Medsize
3 California No Basic Bachelor "1/20/11" Unemployed M Suburban Married "Corporate Auto" "Corporate L2" Offer1 "Call Center" SUV Medsize
4 Washington Yes Basic Bachelor "2/3/11" Employed M Rural Single "Personal Auto" "Personal L1" Offer1 Agent "Four-Door Car" Medsize'''
df_categorical = pd.read_table(StringIO(txt), sep="\s+")
绘图 (两列多个图的单个数字)
fig = plt.figure(figsize=(10,30))
for i, col in enumerate(df_categorical.columns):
# PERCENT COLUMN CALCULATION
df_categorical[col+'_pct'] = df_categorical.groupby(['Response', col])[col]\
.transform(lambda x: len(x)) / len(df_categorical)
plt.subplot(8, 2, i+1)
sns.barplot(x=col, y=col+'_pct', hue='Response', data=df_categorical)\
.set(xlabel=col, ylabel='Percent')
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')