我想用Matplotlib绘制直方图,但是我希望箱子的值代表总观察的百分比。 MWE将是这样的:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy
sns.set(style='dark')
imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
top_left = plt.subplot(121)
top_left.imshow(luminance)
bottom_left = plt.subplot(122)
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
这里的CDF没问题(范围:[0,1]),但得到的直方图与我的期望不符:
为什么直方图的结果在[0,4]范围内?有没有什么办法解决这一问题?
答案 0 :(得分:1)
这里是如何绘制直方图,使得区间总和为1:
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy as np
sns.set(style='dark')
imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
# get the histogram values
heights,edges = np.histogram(luminance.flat, bins=30)
binCenters = (edges[:-1] + edges[1:])/2
# norm the heights
heights = heights/heights.sum()
# get the cdf
cdf = heights.cumsum()
left = plt.subplot(121)
left.imshow(luminance)
right = plt.subplot(122)
right.plot(binCenters, cdf, binCenters, heights)
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
# confirm that the hist vals sum to 1
print('heights sum: %.2f' % heights.sum())
输出:
heights sum: 1.00
这个实际上非常简单。只是做
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True}, norm_hist=True)
以下是我使用上述修改运行脚本时得到的结果:
所以事实证明你的直方图一直都按照正式身份进行了标准化:
在普通(呃)英语中,一般的做法是根据其密度来规范连续有价值的直方图(即它们的观察结果可以表示为浮点数)。因此,在这种情况下,bin宽度乘以bin高度的总和将为1.0,正如您可以通过运行此脚本的简化版本看到的那样:
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import numpy as np
imagen2 = plt.figure(1, figsize=(4,3))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')
luminance = numpy.random.randn(1000, 1000)
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())
heights,edges,patches = plt.hist(luminance.ravel(), density=True, bins=30)
widths = edges[1:] - edges[:-1]
totalWeight = (heights*widths).sum()
# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
print(totalWeight)
totalWeight
确实与1.0
完全相同,给出或取一些舍入错误。
答案 1 :(得分:0)
tel's answer is great!我只是想提供一种替代方案,以较少的线条为您提供所需的直方图。关键思想是在matplotlib weights
函数中使用hist
参数来规范化计数。您可以使用以下三行代码替换sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})
:
lf = luminance.flatten()
sns.kdeplot(lf, cumulative=True)
sns.distplot(lf, kde=False,
hist_kws={'weights': numpy.full(len(lf), 1/len(lf))})
如果您想在第二个y轴上看到直方图(更好的视觉效果),请将ax=bottom_left.twinx()
添加到sns.distplot
: