在海上箱形图上叠加其他百分位标记时出现问题

时间:2019-05-27 04:54:09

标签: python pandas seaborn boxplot

我想在箱线图上绘制其他标记以显示第95和第5个百分位数。我希望晶须显示90%和10%,我相信我可以用whis = [10,95]

为了测试它是否正常工作,我将标记和晶须都设置为5和95。

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

assay=pd.read_csv('df.csv')

#obtain percentiles of interest
pcntls=assay.groupby(['STRAT']).describe(percentiles=[0.05,0.95])
sumry= pcntls['Total'].T



#plot boxplot (with whiskers set to 5 and 95 as well to check)
ax=sns.boxplot(x=assay['STRAT'],y=assay["Total"], whis=[5,95],data=assay, showfliers=False,color='lightblue', 
            showmeans=True,meanprops={"marker":"s","markersize":10,"markerfacecolor":"white", "markeredgecolor":"grey"})
plt.axhline(0.30, color='green',linestyle='dashed', label="0.3% S")
#ax.set_yscale('log')
leg= plt.legend()
plt.title("Assay data")


#overlay additional percentile points ( same as whiskers to check)
ax.scatter(x=list(sumry.columns.values),y=sumry.loc['5%'])
ax.scatter(x=list(sumry.columns.values),y=sumry.loc['95%'])

哪个给我:

最右边的两个图未正确应用标记(标记应与晶须末端的y值相同),数据系列的顺序似乎已颠倒了;而且,百分位数似乎消失了,即使应用了正确的顺序,晶须和标记也不会匹配。任何想法出什么事了以及如何解决?

下面的数据。

From    To  Interval (m)    Class   STRAT   Total
308 309 1   PAF CBC 4.15
309 310 1   PAF CBC 3.76
320 321 1   PAF-LC  CBC 0.85
330 331 1   PAF-LC  CBC 0.698
342 343 1   NAF LBB 0.259
376 377 1   NAF LBB 0.395
412 413 1   PAF-LC  LBB 1.19
51  52  1   PAF UBB 0.1
420 420.5   0.5 PAF-LC  UAB 1
189 190 1   PAF LBB 1.52
520 521 1   NAF UAB 3
632 633 1   NAF UAB 0.0615
644 645 1   NAF-AC  UAB 0.178
308 309 1   PAF CBC 4.15
309 310 1   PAF CBC 3.76
320 321 1   PAF-LC  CBC 0.85
330 331 1   PAF-LC  CBC 0.698
342 343 1   NAF-AC  LBB 0.259
376 377 1   NAF-AC  LBB 0.395
412 413 1   PAF-LC  LBB 1.19
51  52  1   PAF UBB 2.27
420 420.5   0.5 PAF-LC  UAB 1
189 190 1   PAF LBB 1.52
520 521 1   NAF-AC  UAB 1
632 633 1   NAF-AC  UAB 0.0615
644 645 1   NAF-AC  UAB 0.178
308 309 1   PAF CBC 4.15
309 310 1   PAF CBC 3.76
320 321 1   PAF-LC  CBC 0.85
330 331 1   PAF-LC  CBC 0.698
342 343 1   NAF-AC  LBB 0.259
376 377 1   NAF-AC  LBB 0.395
412 413 1   PAF-LC  LBB 1.19
51  52  1   PAF UBB 2.27
420 420.5   0.5 PAF-LC  UAB 0.002
189 190 1   PAF LBB 1.52
520 521 1   NAF-HS  UAB 1.45
632 633 1   NAF-HS  UAB 0.0615
644 645 1   NAF-HS  UAB 0.178
308 309 1   PAF CBC 4.15
309 310 1   PAF CBC 3.76
320 321 1   PAF-LC  CBC 0.85
330 331 1   PAF-LC  CBC 0.698
342 343 1   NAF-HS  LBB 0.259
376 377 1   NAF-HS  LBB 0.395
412 413 1   PAF-LC  LBB 1.19
51  52  1   PAF UBB 3
420 420.5   0.5 PAF-LC  UAB 1
189 190 1   PAF LBB 1.52
520 521 1   NAF-HS  UAB 1.45
632 633 1   NAF-HS  UAB 0.0615
644 645 1   NAF-HS  UAB 0.178
51  52  1   PAF UBB 0.1
51  52  1   PAF UBB 0.2
51  52  1   PAF UBB 2.27
51  52  1   PAF UBB 3

1 个答案:

答案 0 :(得分:1)

使用以下方法对箱线图数据进行简单排序:

ordered=sorted(assay['STRAT'].unique())

并对百分位数数据执行相同操作:

ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['5%'])
ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['95%'])

加上一些网格线,给出:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

assay=pd.read_csv('df.csv')

#obtain percentiles of interest
pcntls=assay.groupby(['STRAT']).describe(percentiles=[0.05,0.95])
sumry= pcntls['Total'].T

ordered=sorted(assay['STRAT'].unique())

#plot boxplot (with whiskers set to 5 and 95 as well to check)
ax=sns.boxplot(x=assay['STRAT'],y=assay["Total"], order=ordered,whis=[5,95],data=assay, showfliers=False,color='lightblue', 
            showmeans=True,meanprops={"marker":"s","markersize":10,"markerfacecolor":"white", "markeredgecolor":"grey"})
plt.axhline(0.30, color='green',linestyle='dashed', label="0.3% S")
#ax.set_yscale('log')
leg= plt.legend()
plt.title("Assay data")

plt.grid(True, which='both')

#overlay additional percentile points ( same as whiskers to check)
ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['5%'])
ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['95%'])

结果以正确的输出顺序排列,但是UAB的第95个百分位数的计算存在差异,这可能是由于多种方法和较小的数据集所致。例如here enter image description here