如何绘制groupby占seaborn的百分比?

时间:2018-12-14 15:48:27

标签: python pandas matplotlib plot seaborn

我有一个二进制分类问题,我想用RandomForestClassifier解决。我的目标列是“成功”,它是0或1。我想研究数据,然后看它的样子。为此,我试图按类别对地块进行计数。但这并不是说“成功”占总数的百分比(即成功== 1)

如何更改以下图,以使这些子图显示所有帖子中(成功== 1)的百分比? (比方说,在“工作日”类别中,在“星期六”这一天,我有10个数据点,其中有7个是成功的(“成功” == 1),所以我想在当天设置一个点数为0.7的酒吧。

这是实际图(计数:-/):

actual count plot

这是我的数据框的一部分:

15 samples of my df

这是用于生成实际图的实际代码:

# Plot 

sns.set(style="darkgrid")

x_vals = [['page_name', 'weekday'],['type', 'industry']]
subtitles = [['by Page', 'by Weekday'],['by Content Type', 'by Industry']]

fig, ax = plt.subplots(2,2, figsize=(15,10))
#jitter = [[False, 1], [0.5, 0.2]]

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].tick_params(labelsize=15)
        ax[j][i].set_xlabel('label', fontsize=17, position=(.5,20))
        if (j == 0) :
            ax[j][i].tick_params(axis="x", rotation=50) 
        ax[j][i].set_ylabel('label', fontsize=17)
        ax[j][i] = sns.countplot(x=x_vals[j][i], hue="successful", data=mainDf, ax=ax[j][i])

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].set_xlabel('', fontsize=17)
        ax[j][i].set_ylabel('count', fontsize=17)
        ax[j][i].set_title(subtitles[j][i], fontsize=18)

fig.suptitle('Success Count by Category', position=(.5,1.05), fontsize=20)

fig.tight_layout()
fig.show()

PS:请不要,我正在使用Seaborn。如果可能的话,也应该与Seaborn一起解决。谢谢!

2 个答案:

答案 0 :(得分:1)

您可以在此处使用barplot。我不确定100%的实际目标,因此我开发了几种解决方案。

每成功(不成功)的成功(不成功)的频率

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

mainDf['frequency'] = 0 # a dummy column to refer to
for col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):
    counts = mainDf.groupby([col, 'successful']).count()
    freq_per_group = counts.div(counts.groupby('successful').transform('sum')).reset_index()
    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_group, ax=ax)

enter image description here

每组成功(失败)的频率

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

mainDf['frequency'] = 0 # a dummy column to refer to
for col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):
    counts = mainDf.groupby([col, 'successful']).count()
    freq_per_group = counts.div(counts.groupby(col).transform('sum')).reset_index()
    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_group, ax=ax)

根据您提供的数据给出的

enter image description here

成功总数(失败)的频率

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

mainDf['frequency'] = 0 # a dummy column to refer to
total = len(mainDf)
for col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):
    counts = mainDf.groupby([col, 'successful']).count()
    freq_per_total = counts.div(total).reset_index()
    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_total, ax=ax)

enter image description here

答案 1 :(得分:0)

将行ax[j][i] = sns.countplot(x=x_vals[j][i], hue="successful", data=mainDf, ax=ax[j][i])更改为ax[j][i] = sns.barplot(x=x_vals[j][i], y='successful', data=mainDf, ax=ax[j][i], ci=None, estimator=lambda x: sum(x) / len(x) * 100)

您的代码应该是

sns.set(style="darkgrid")

x_vals = [['page_name', 'weekday'],['type', 'industry']]
subtitles = [['by Page', 'by Weekday'],['by Content Type', 'by Industry']]

fig, ax = plt.subplots(2,2, figsize=(15,10))
#jitter = [[False, 1], [0.5, 0.2]]

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].tick_params(labelsize=15)
        ax[j][i].set_xlabel('label', fontsize=17, position=(.5,20))
        if (j == 0) :
            ax[j][i].tick_params(axis="x", rotation=50) 
        ax[j][i].set_ylabel('label', fontsize=17)
        ax[j][i] = sns.barplot(x=x_vals[j][i], y='successful', data=mainDf, ax=ax[j][i], ci=None, estimator=lambda x: sum(x) / len(x) * 100)

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].set_xlabel('', fontsize=17)
        ax[j][i].set_ylabel('percent', fontsize=17)
        ax[j][i].set_title(subtitles[j][i], fontsize=18)

fig.suptitle('Success Percentage by Category', position=(.5,1.05), fontsize=20)

fig.tight_layout()
fig.show()