如何在seaborn的酒吧顶部添加百分比?

时间:2015-07-31 15:04:38

标签: python matplotlib seaborn

考虑到以下计数图,我如何在栏上放置百分比?

import seaborn as sns
sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
ax = sns.countplot(x="class", hue="who", data=titanic)

enter image description here

例如对于“First”我想要总的第一人/总第一,总第一女人/总第一,总第一个孩子/总第一个在他们各自的酒吧之上。

如果我的解释不明确,请告诉我。

谢谢!

5 个答案:

答案 0 :(得分:40)

sns.barplot doesn't explicitly return the barplot values the way matplotlib.pyplot.bar does (see last para), but if you've plotted nothing else you can risk assuming that all the patches in the axes are your values. Then you can use the sub-totals that the barplot function has calculated for you:

from matplotlib.pyplot import show
import seaborn as sns
sns.set(style="darkgrid")
titanic = sns.load_dataset("titanic")
total = float(len(titanic)) # one person per row 
#ax = sns.barplot(x="class", hue="who", data=titanic)
ax = sns.countplot(x="class", hue="who", data=titanic) # for Seaborn version 0.7 and more
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}'.format(height/total),
            ha="center") 
show()

produces

Countplot

An alternate approach is to do the sub-summing explicitly, e.g. with the excellent pandas, and plot with matplotlib, and also do the styling yourself. (Though you can get quite a lot of styling from sns context even when using matplotlib plotting functions. Try it out -- )

答案 1 :(得分:4)

如果在绘图中具有“ hue”参数,则

with_hue 函数将在条形图上绘制百分比。它以实际图形,要素,要素中的类别数和色调类别(色调特征中的类别数)作为参数。

如果您具有正常图,

without_hue 函数将在条形图上绘制百分比。它以实际图形和特征为参数。

def with_hue(plot, feature, Number_of_categories, hue_categories):
    a = [p.get_height() for p in plot.patches]
    patch = [p for p in plot.patches]
    for i in range(Number_of_categories):
        total = feature.value_counts().values[i]
        for j in range(hue_categories):
            percentage = '{:.1f}%'.format(100 * a[(j*Number_of_categories + i)]/total)
            x = patch[(j*Number_of_categories + i)].get_x() + patch[(j*Number_of_categories + i)].get_width() / 2 - 0.15
            y = patch[(j*Number_of_categories + i)].get_y() + patch[(j*Number_of_categories + i)].get_height() 
            ax.annotate(percentage, (x, y), size = 12)
    plt.show()

def without_hue(plot, feature):
    total = len(feature)
    for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_height()/total)
        x = p.get_x() + p.get_width() / 2 - 0.05
        y = p.get_y() + p.get_height()
        ax.annotate(percentage, (x, y), size = 12)
    plt.show()

enter image description here

enter image description here

答案 2 :(得分:1)

上面的jrjc和cphlewis答案启发了答案,但更简单易懂

sns.set(style="whitegrid")
plt.figure(figsize=(8,5))
total = float(len(train_df))
ax = sns.countplot(x="event", hue="event", data=train_df)
plt.title('Data provided for each event', fontsize=20)
for p in ax.patches:
    percentage = '{:.1f}%'.format(100 * p.get_height()/total)
    x = p.get_x() + p.get_width()
    y = p.get_height()
    ax.annotate(percentage, (x, y),ha='center')
plt.show()

count plot with percentage

答案 3 :(得分:0)

借助cphlewis's解决方案,我设法将正确的百分比放在图表的顶部,因此类别总计为一个。

for index, category in enumerate(categorical):
    plt.subplot(plot_count, 1, index + 1)

    order = sorted(data[category].unique())
    ax = sns.countplot(category, data=data, hue="churn", order=order)
    ax.set_ylabel('')

    bars = ax.patches
    half = int(len(bars)/2)
    left_bars = bars[:half]
    right_bars = bars[half:]

    for left, right in zip(left_bars, right_bars):
        height_l = left.get_height()
        height_r = right.get_height()
        total = height_l + height_r

        ax.text(left.get_x() + left.get_width()/2., height_l + 40, '{0:.0%}'.format(height_l/total), ha="center")
        ax.text(right.get_x() + right.get_width()/2., height_r + 40, '{0:.0%}'.format(height_r/total), ha="center")

enter image description here

但是,该解决方案假定有2个选项(男人,女人),而不是3个选项(男人,女人,孩子)。

由于Axes.patches的排列方式很奇怪(首先是所有蓝色条,然后是所有绿色条,然后是所有红色条),因此您必须将它们分开并相应地拉回。

答案 4 :(得分:0)

如果有 2 个以上的色调类别,我将无法使用这些方法。

我使用了@Lord Zsolt 的方法,增强了任意数量的色调类别。

def barPerc(df,xVar,ax):
    '''
    barPerc(): Add percentage for hues to bar plots
    args:
        df: pandas dataframe
        xVar: (string) X variable 
        ax: Axes object (for Seaborn Countplot/Bar plot or
                         pandas bar plot)
    '''
    # 1. how many X categories
    ##   check for NaN and remove
    numX=len([x for x in df[xVar].unique() if x==x])

    # 2. The bars are created in hue order, organize them
    bars = ax.patches
    ## 2a. For each X variable
    for ind in range(numX):
        ## 2b. Get every hue bar
        ##     ex. 8 X categories, 4 hues =>
        ##    [0, 8, 16, 24] are hue bars for 1st X category
        hueBars=bars[ind:][::numX]
        ## 2c. Get the total height (for percentages)
        total = sum([x.get_height() for x in hueBars])

        # 3. Print the percentage on the bars
        for bar in hueBars:
            ax.text(bar.get_x() + bar.get_width()/2.,
                    bar.get_height(),
                    f'{bar.get_height()/total:.0%}',
                    ha="center",va="bottom")

enter image description here

如您所见,这种方法符合原始发布者的要求:

<块引用>

我想要在他们各自的条形上的总第一名男子/总第一名、总第一名妇女/总第一名和总第一名儿童/总第一名。

也就是说,添加的值是每个色调的百分比(对于每个 X 类别) - 这样 对于每个 X 类别,百分比加起来为 100% >


(这也适用于 Seaborn 的 .barplot())

enter image description here