Question

当我使用“matplotlib”形状时，我在boxplot的{{1}}函数中遇到了一些奇怪的行为。我正在使用一些我刚才写过的代码而且从未遇到过这些问题 - 我想知道问题是什么。有什么想法吗？

weird behaviour on notched boxplots

当我关闭凹口形状时，它看起来很正常

unnotched boxplots look normal

这将是代码：

notch

当我在没有自定义的情况下制作一个简单的情节时，问题仍然存在：

def boxplot_modified(data):

    fig = plt.figure(figsize=(8,6))
    ax = plt.subplot(111) 

    bplot = plt.boxplot(data, 
            #notch=True,          # notch shape 
            vert=True,           # vertical box aligmnent
            sym='ko',            # red circle for outliers
            patch_artist=True,   # fill with color
            )   

    # choosing custom colors to fill the boxes
    colors = 3*['lightgreen'] + 3*['lightblue'], 'lightblue', 'lightblue', 'lightblue']
    for patch, color in zip(bplot['boxes'], colors):
        patch.set_facecolor(color)

    # modifying the whiskers: straight lines, black, wider
    for whisker in bplot['whiskers']:
        whisker.set(color='black', linewidth=1.2, linestyle='-')    

    # making the caps a little bit wider 
    for cap in bplot['caps']:
        cap.set(linewidth=1.2)

    # hiding axis ticks
    plt.tick_params(axis="both", which="both", bottom="off", top="off",  
            labelbottom="on", left="off", right="off", labelleft="on")

    # adding horizontal grid lines 
    ax.yaxis.grid(True) 

    # remove axis spines
    ax.spines["top"].set_visible(False)  
    ax.spines["right"].set_visible(False) 
    ax.spines["bottom"].set_visible(True) 
    ax.spines["left"].set_visible(True)

    plt.xticks([y+1 for y in range(len(data))], 8*['x'])

    # raised title
    #plt.text(2, 1, 'Modified',
    #     horizontalalignment='center',
    #     fontsize=18)

    plt.tight_layout()
    plt.show()

boxplot_modified(df.values)

notch plot without customization still looks weird

Answer 1

好的，事实证明，这实际上是一种正确的行为;）

来自Wikipedia：

缺口框图适用于＆＃34;缺口＆＃34;或缩小中位数周围的方框。凹槽有助于提供中位数差异显着性的粗略指导;如果两个盒子的凹口不重叠，这提供了中位数之间统计上显着差异的证据。凹口的宽度与样品的四分位数范围成比例，并与样品尺寸的平方根成反比。然而，最合适的乘数存在不确定性（因为这可能根据样本方差的相似性而变化）。一种惯例是使用+/- 1.58 * IQR / sqrt（n）。

这也在issue on GitHub中讨论过; R产生类似的输出作为证明这种行为是正确的。＆＃34;

因此，如果我们有这种奇怪的＆＃34;翻转＆＃34;在缺口框图中的外观，它只是意味着第一个四分位数的值低于平均值的置信度，反之亦然第三个四分位数。虽然它看起来很丑陋，但它实际上是关于中位数（非）置信度的有用信息。

自举（随机抽样与替换以估计抽样分布的参数，此处：置信区间）可能会减少这种影响：

来自plt.boxplot文档：

bootstrap：无（默认）或整数指定是否引导置信区间在缺口箱图的中间位置。如果bootstrap ==无，不执行自举，并计算陷波使用基于高斯的渐近逼近（参见McGill，R。， Tukey，J.W。和Larsen，W.A.，1978，以及Kendall和Stuart， 1967年）。否则，bootstrap指定的次数引导中位数以确定它的95％置信区间。建议值介于1000和10000之间。

使用凹口形状时matplotlib boxplot的奇怪行为

1 个答案: