我想在箱线图上绘制其他标记以显示第95和第5个百分位数。我希望晶须显示90%和10%,我相信我可以用whis = [10,95]
为了测试它是否正常工作,我将标记和晶须都设置为5和95。
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
assay=pd.read_csv('df.csv')
#obtain percentiles of interest
pcntls=assay.groupby(['STRAT']).describe(percentiles=[0.05,0.95])
sumry= pcntls['Total'].T
#plot boxplot (with whiskers set to 5 and 95 as well to check)
ax=sns.boxplot(x=assay['STRAT'],y=assay["Total"], whis=[5,95],data=assay, showfliers=False,color='lightblue',
showmeans=True,meanprops={"marker":"s","markersize":10,"markerfacecolor":"white", "markeredgecolor":"grey"})
plt.axhline(0.30, color='green',linestyle='dashed', label="0.3% S")
#ax.set_yscale('log')
leg= plt.legend()
plt.title("Assay data")
#overlay additional percentile points ( same as whiskers to check)
ax.scatter(x=list(sumry.columns.values),y=sumry.loc['5%'])
ax.scatter(x=list(sumry.columns.values),y=sumry.loc['95%'])
哪个给我:
最右边的两个图未正确应用标记(标记应与晶须末端的y值相同),数据系列的顺序似乎已颠倒了;而且,百分位数似乎消失了,即使应用了正确的顺序,晶须和标记也不会匹配。任何想法出什么事了以及如何解决?
下面的数据。
From To Interval (m) Class STRAT Total
308 309 1 PAF CBC 4.15
309 310 1 PAF CBC 3.76
320 321 1 PAF-LC CBC 0.85
330 331 1 PAF-LC CBC 0.698
342 343 1 NAF LBB 0.259
376 377 1 NAF LBB 0.395
412 413 1 PAF-LC LBB 1.19
51 52 1 PAF UBB 0.1
420 420.5 0.5 PAF-LC UAB 1
189 190 1 PAF LBB 1.52
520 521 1 NAF UAB 3
632 633 1 NAF UAB 0.0615
644 645 1 NAF-AC UAB 0.178
308 309 1 PAF CBC 4.15
309 310 1 PAF CBC 3.76
320 321 1 PAF-LC CBC 0.85
330 331 1 PAF-LC CBC 0.698
342 343 1 NAF-AC LBB 0.259
376 377 1 NAF-AC LBB 0.395
412 413 1 PAF-LC LBB 1.19
51 52 1 PAF UBB 2.27
420 420.5 0.5 PAF-LC UAB 1
189 190 1 PAF LBB 1.52
520 521 1 NAF-AC UAB 1
632 633 1 NAF-AC UAB 0.0615
644 645 1 NAF-AC UAB 0.178
308 309 1 PAF CBC 4.15
309 310 1 PAF CBC 3.76
320 321 1 PAF-LC CBC 0.85
330 331 1 PAF-LC CBC 0.698
342 343 1 NAF-AC LBB 0.259
376 377 1 NAF-AC LBB 0.395
412 413 1 PAF-LC LBB 1.19
51 52 1 PAF UBB 2.27
420 420.5 0.5 PAF-LC UAB 0.002
189 190 1 PAF LBB 1.52
520 521 1 NAF-HS UAB 1.45
632 633 1 NAF-HS UAB 0.0615
644 645 1 NAF-HS UAB 0.178
308 309 1 PAF CBC 4.15
309 310 1 PAF CBC 3.76
320 321 1 PAF-LC CBC 0.85
330 331 1 PAF-LC CBC 0.698
342 343 1 NAF-HS LBB 0.259
376 377 1 NAF-HS LBB 0.395
412 413 1 PAF-LC LBB 1.19
51 52 1 PAF UBB 3
420 420.5 0.5 PAF-LC UAB 1
189 190 1 PAF LBB 1.52
520 521 1 NAF-HS UAB 1.45
632 633 1 NAF-HS UAB 0.0615
644 645 1 NAF-HS UAB 0.178
51 52 1 PAF UBB 0.1
51 52 1 PAF UBB 0.2
51 52 1 PAF UBB 2.27
51 52 1 PAF UBB 3
答案 0 :(得分:1)
使用以下方法对箱线图数据进行简单排序:
ordered=sorted(assay['STRAT'].unique())
并对百分位数数据执行相同操作:
ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['5%'])
ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['95%'])
加上一些网格线,给出:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
assay=pd.read_csv('df.csv')
#obtain percentiles of interest
pcntls=assay.groupby(['STRAT']).describe(percentiles=[0.05,0.95])
sumry= pcntls['Total'].T
ordered=sorted(assay['STRAT'].unique())
#plot boxplot (with whiskers set to 5 and 95 as well to check)
ax=sns.boxplot(x=assay['STRAT'],y=assay["Total"], order=ordered,whis=[5,95],data=assay, showfliers=False,color='lightblue',
showmeans=True,meanprops={"marker":"s","markersize":10,"markerfacecolor":"white", "markeredgecolor":"grey"})
plt.axhline(0.30, color='green',linestyle='dashed', label="0.3% S")
#ax.set_yscale('log')
leg= plt.legend()
plt.title("Assay data")
plt.grid(True, which='both')
#overlay additional percentile points ( same as whiskers to check)
ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['5%'])
ax.scatter(x=sorted(list(sumry.columns.values)),y=sumry.loc['95%'])
结果以正确的输出顺序排列,但是UAB的第95个百分位数的计算存在差异,这可能是由于多种方法和较小的数据集所致。例如here