Seaborn的一半(未拆分!)小提琴积木

时间:2018-12-20 16:22:00

标签: python-3.x pandas seaborn

根据split=True变量,当前seaborn通过设置hue提供functionality for split violinplots。我想绘制一个“半”小提琴图,即省略了每个小提琴的一半的图。对于每个连续变量,此类图描绘了类似于pdf的内容,仅绘制在每个类别变量的每个垂直线的一侧。

我设法诱骗seaborn使用绘制的值范围之外的额外数据点和额外的虚拟色调来绘制此图形,但是我想知道是否可以在不实际更改数据集的情况下完成此操作,例如sns.violinplot()参数中。

例如,此图:

enter image description here

由以下代码段创建:

# imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# load dataset from seaborn
datalist = sns.get_dataset_names()
dataset_name = 'iris'
if dataset_name in datalist:
    df = sns.load_dataset(dataset_name)
else:
    print("Dataset with name: " + dataset_name + " was not found in the available datasets online by seaborn.")

# prepare data
df2 = df.append([-999,-999,-999,-999,'setosa'])
df2['huecol'] = 0.0
df2['huecol'].iloc[-1]= -999

# plot
fig = plt.figure(figsize=(6,6))
sns.violinplot(x='species',y="sepal_width",
            split=True, hue ='huecol', inner = 'quartile',
            palette="pastel", data=df2, legend=False)
plt.title('iris')

# remove hue legend
leg = plt.gca().legend()
leg.remove()
plt.ylim([1,5.0])
plt.show()

2 个答案:

答案 0 :(得分:3)

答案很简单,不,对于Seaborn,如果不欺骗就认为存在hue是不可能的。

This answer展示了如何在matplotlib中执行此操作,并且原则上也可以将其应用于海底小提琴图,即切出小提琴路径的一半。

答案 1 :(得分:1)

我正在寻找与此类似的解决方案,但没有找到任何令人满意的解决方案。我最终多次调用 seaborn.kdeplot,因为 violinplot 本质上是一个单边核密度图。

示例

categorical_kde_plot 下面的函数定义

categorical_kde_plot(
    df,
    variable="tip",
    category="day",
    category_order=["Thur", "Fri", "Sat", "Sun"],
    horizontal=False,
)

使用 horizontal=True,输出将如下所示:

代码

import seaborn as sns
from matplotlib import pyplot as plt


def categorical_kde_plot(
    df,
    variable,
    category,
    category_order=None,
    horizontal=False,
    rug=True,
    figsize=None,
):
    """Draw a categorical KDE plot

    Parameters
    ----------
    df: pd.DataFrame
        The data to plot
    variable: str
        The column in the `df` to plot (continuous variable)
    category: str
        The column in the `df` to use for grouping (categorical variable)
    horizontal: bool
        If True, draw density plots horizontally. Otherwise, draw them
        vertically.
    rug: bool
        If True, add also a sns.rugplot.
    figsize: tuple or None
        If None, use default figsize of (7, 1*len(categories))
        If tuple, use that figsize. Given to plt.subplots as an argument.
    """
    if category_order is None:
        categories = list(df[category].unique())
    else:
        categories = category_order[:]

    figsize = (7, 1.0 * len(categories))

    fig, axes = plt.subplots(
        nrows=len(categories) if horizontal else 1,
        ncols=1 if horizontal else len(categories),
        figsize=figsize[::-1] if not horizontal else figsize,
        sharex=horizontal,
        sharey=not horizontal,
    )

    for i, (cat, ax) in enumerate(zip(categories, axes)):
        sns.kdeplot(
            data=df[df[category] == cat],
            x=variable if horizontal else None,
            y=None if horizontal else variable,
            # kde kwargs
            bw_adjust=0.5,
            clip_on=False,
            fill=True,
            alpha=1,
            linewidth=1.5,
            ax=ax,
            color="lightslategray",
        )

        keep_variable_axis = (i == len(fig.axes) - 1) if horizontal else (i == 0)

        if rug:
            sns.rugplot(
                data=df[df[category] == cat],
                x=variable if horizontal else None,
                y=None if horizontal else variable,
                ax=ax,
                color="black",
                height=0.025 if keep_variable_axis else 0.04,
            )

        _format_axis(
            ax,
            cat,
            horizontal,
            keep_variable_axis=keep_variable_axis,
        )

    plt.tight_layout()
    plt.show()


def _format_axis(ax, category, horizontal=False, keep_variable_axis=True):

    # Remove the axis lines
    ax.spines["top"].set_visible(False)
    ax.spines["right"].set_visible(False)

    if horizontal:
        ax.set_ylabel(None)
        lim = ax.get_ylim()
        ax.set_yticks([(lim[0] + lim[1]) / 2])
        ax.set_yticklabels([category])
        if not keep_variable_axis:
            ax.get_xaxis().set_visible(False)
            ax.spines["bottom"].set_visible(False)
    else:
        ax.set_xlabel(None)
        lim = ax.get_xlim()
        ax.set_xticks([(lim[0] + lim[1]) / 2])
        ax.set_xticklabels([category])
        if not keep_variable_axis:
            ax.get_yaxis().set_visible(False)
            ax.spines["left"].set_visible(False)


if __name__ == "__main__":
    df = sns.load_dataset("tips")

    categorical_kde_plot(
        df,
        variable="tip",
        category="day",
        category_order=["Thur", "Fri", "Sat", "Sun"],
        horizontal=True,
    )