如何按递增顺序(按中值)对箱形图值进行排序?

时间:2018-11-20 09:54:53

标签: python pandas matplotlib seaborn

这是我的熊猫DataFrame:

Area            Gender  Quantity
XXX             Men     115
XXX             Men     105    
XXX             Men     114
YYY             Men     100
YYY             Men     90    
YYY             Men     95
YYY             Men     101
XXX             Women   120    
XXX             Women   122
XXX             Women   115
XXX             Women   117    
YYY             Women   91
YYY             Women   90
YYY             Women   90

这就是我创建箱形图的方式。

import seaboard as sns
import matplotlib.pyplot as pat

fig, ax = plt.subplots(figsize=(15,11))
ax = sns.boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")

我想按中位数AreaQuantity组进行排序。我该怎么办?

2 个答案:

答案 0 :(得分:4)

当前版本的seaborn(<= 0.9.0)无法立即实现。您目前可以做的最好的事情就是设置hue_order(例如:['Woman', 'Men']),但是它同样适用于所有组,这不是您想要的。

此外,扩展boxplot()并不是那么简单,因为seaborn不会公开官方API中负责绘图的类。参见here the entry point to boxplot()(永久链接到seaborn主版本,截至2018年10月20日,git hash:84ca6c6)。

如果您不担心使用内部海洋对象,则可以创建自己的sorted_boxplot()版本。实现排序的最简单方法是在_BoxPlotter.draw_boxplot()中修改the following line(永久链接,git:84ca6c6):

# Original
center = i + offsets[j]

# Fix:
ordered_offsets = ...
center = i + ordered_offsets[j]

center表示箱线图的位置,i是组的索引,j是当前hue的索引。我通过_BoxPlotter派生并覆盖draw_boxplot()对此进行了测试,请参见下面的一些代码。

PS:如果有人对此进行详细说明以提出对Seaborn的拉动请求,那就太好了。该功能肯定有用。


以下对我有用(python 3.6,seaborn 0.9.0):

import numpy as np
import seaborn as sns
from seaborn.categorical import _BoxPlotter
from seaborn.utils import remove_na

class SortedBoxPlotter(_BoxPlotter):
    def __init__(self, *args, **kwargs):
        super(SortedBoxPlotter, self).__init__(*args, **kwargs)

    def draw_boxplot(self, ax, kws):
        '''
        Below code has been copied partly from seaborn.categorical.py
        and is reproduced only for educational purposes.
        '''
        if self.plot_hues is None:
            # Sorting by hue doesn't apply here. Just
            return super(SortedBoxPlotter, self).draw_boxplot(ax, kws)

        vert = self.orient == "v"
        props = {}
        for obj in ["box", "whisker", "cap", "median", "flier"]:
            props[obj] = kws.pop(obj + "props", {})

        for i, group_data in enumerate(self.plot_data):

            # ==> Sort offsets by median
            offsets = self.hue_offsets
            medians = [ np.median(group_data[self.plot_hues[i] == h])
                        for h in self.hue_names ]
            offsets_sorted = offsets[np.argsort(medians)]

            # Draw nested groups of boxes
            for j, hue_level in enumerate(self.hue_names):

                # Add a legend for this hue level
                if not i:
                    self.add_legend_data(ax, self.colors[j], hue_level)

                # Handle case where there is data at this level
                if group_data.size == 0:
                    continue

                hue_mask = self.plot_hues[i] == hue_level
                box_data = remove_na(group_data[hue_mask])

                # Handle case where there is no non-null data
                if box_data.size == 0:
                    continue

                # ==> Fix ordering
                center = i + offsets_sorted[j]

                artist_dict = ax.boxplot(box_data,
                                         vert=vert,
                                         patch_artist=True,
                                         positions=[center],
                                         widths=self.nested_width,
                                         **kws)
                self.restyle_boxplot(artist_dict, self.colors[j], props)

def sorted_boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,
                   orient=None, color=None, palette=None, saturation=.75,
                   width=.8, dodge=True, fliersize=5, linewidth=None,
                   whis=1.5, notch=False, ax=None, **kwargs):

    '''
    Same as sns.boxplot(), except that nested groups of boxes are plotted by
    increasing median.
    '''

    plotter = SortedBoxPlotter(x, y, hue, data, order, hue_order,
                               orient, color, palette, saturation,
                               width, dodge, fliersize, linewidth)
    if ax is None:
        ax = plt.gca()
    kwargs.update(dict(whis=whis, notch=notch))
    plotter.plot(ax, kwargs)
    return ax

要使用示例数据运行

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame([ ["XXX", "Men" ,  115],
                    ["XXX", "Men" ,  105    ],
                    ["XXX", "Men" ,  114],
                    ["YYY", "Men" ,  100],
                    ["YYY", "Men" ,  90    ],
                    ["YYY", "Men" ,  95],
                    ["YYY", "Men" ,  101],
                    ["XXX", "Women", 120    ],
                    ["XXX", "Women", 122],
                    ["XXX", "Women", 115],
                    ["XXX", "Women", 117    ],
                    ["YYY", "Women", 91],
                    ["YYY", "Women", 90],
                    ["YYY", "Women", 90]],
                  columns = ["Area", "Gender", "Quantity"])
sorted_boxplot(x="Area", y="Quantity", hue="Gender", data=df, palette="Set3")
plt.show()

结果:

enter image description here

答案 1 :(得分:0)

您可以在sns.boxplot函数中传递“ order”参数。 看到这个-https://python-graph-gallery.com/35-control-order-of-boxplot/