我试图计算matplotlib中boxplot的胡须和盒子坐标。我不明白我的错误以及为什么我不计算相同的值。
Q1, median, Q3 = np.percentile(becher, [25, 50, 75])
IQR = Q3 - Q1
Qs = [Q1, median, Q3, Q1 - 1.5 * IQR, Q3 + 1.5 * IQR]
Qname = ["Q1", "median", "Q3", "Q1-1.5xIQR", "Q3+1.5xIQR"]
for Q, name in zip(Qs, Qname):
plt.axhline(Q, color="k")
plt.text(1.52, Q, name)
plt.boxplot(becher)
如下图所示,Q1,Q3和中位数都可以。但胡须是错误的。
以下是我的数据:
becher = [9.1495,
9.9479,
9.7933,
9.8002,
8.47,
9.14,
9.06,
9.6933,
9.7871,
10.5676,
9.7441,
10.4874,
7.9584,
7.9598,
8.3483,
7.2536,
9.0823,
10.8343,
10.4104,
7.2004,
9.6297,
9.96,
9.761,
9.684,
8.6062,
10.2098,
8.9002,
8.4511,
9.3335,
9.34946,
8.0319,
7.6379,
7.8435,
8.7572,
8.0516,
8.4134,
10.0623,
9.6406,
9.0502,
10.6821,
11.1951,
11.1876,
10.0111,
8.8456,
10.2769,
9.3939,
11.3178,
9.397,
9.9851,
9.9921,
10.1132,
8.9775,
10.499,
11.209,
10.66,
10.2704,
10.9543,
10.6529,
10.9925,
9.6625,
7.8673,
9.0023,
8.9538,
9.3961,
8.8799,
9.3722,
10.697,
9.808,
9.894,
9.5648,
10.2994,
9.0708,
9.2368,
8.8131,
8.3218,
10.1733,
9.5885,
10.7685,
9.2015,
9.881,
9.4362,
9.9686,
9.3,
9.979,
9.896,
10.05,
9.9113,
8.533,
9.68297]
答案 0 :(得分:5)
还有另一项调整,在较旧的文档字符串中更明确,例如,来自Matplotlib v1.3.1:
*whis* : [ default 1.5 ]
Defines the length of the whiskers as a function of the inner
quartile range. They extend to the most extreme data point
within ( ``whis*(75%-25%)`` ) data range.
因此胡须扩展到实际数据点。在您的情况下,您可以通过在脚本中添加几行来看到这一点:
Q1, median, Q3 = np.percentile(np.asarray(becher), [25, 50, 75])
IQR = Q3 - Q1
loval = Q1 - 1.5 * IQR
hival = Q3 + 1.5 * IQR
wiskhi = np.compress(becher <= hival, becher)
wisklo = np.compress(becher >= loval, becher)
actual_hival = np.max(wiskhi)
actual_loval = np.min(wisklo)
Qs = [Q1, median, Q3, loval, hival, actual_loval, actual_hival]
Qname = ["Q1", "median", "Q3", "Q1-1.5xIQR", "Q3+1.5xIQR",
"Actual LO", "Actual HI"]
for Q, name in zip(Qs, Qname):
plt.axhline(Q, color="k")
plt.text(1.52, Q, name)
plt.boxplot(becher)