使用matplotlib标准化直方图

时间:2018-04-11 18:15:45

标签: python matplotlib histogram seaborn

我想用Matplotlib绘制直方图,但是我希望箱子的值代表总观察的百分比。 MWE将是这样的:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy

sns.set(style='dark')

imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')

luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())

top_left = plt.subplot(121)
top_left.imshow(luminance)
bottom_left = plt.subplot(122)
sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})

# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()

这里的CDF没问题(范围:[0,1]),但得到的直方图与我的期望不符:

Histogram with values out of valid range

为什么直方图的结果在[0,4]范围内?有没有什么办法解决这一问题?

2 个答案:

答案 0 :(得分:1)

你认为你想要什么

这里是如何绘制直方图,使得区间总和为1:

import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import seaborn as sns
import numpy as np

sns.set(style='dark')

imagen2 = plt.figure(1, figsize=(5, 2))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')

luminance = numpy.random.randn(1000, 1000)
# "Luminance" should range from 0.0...1.0 so we normalize it
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())

# get the histogram values
heights,edges = np.histogram(luminance.flat, bins=30)
binCenters = (edges[:-1] + edges[1:])/2

# norm the heights
heights = heights/heights.sum()

# get the cdf
cdf = heights.cumsum()

left = plt.subplot(121)
left.imshow(luminance)
right = plt.subplot(122)
right.plot(binCenters, cdf, binCenters, heights)

# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()

# confirm that the hist vals sum to 1
print('heights sum: %.2f' % heights.sum())

输出:

enter image description here

heights sum: 1.00

实际答案

这个实际上非常简单。只是做

sns.distplot(luminance.flatten(), kde_kws={"cumulative": True}, norm_hist=True)

以下是我使用上述修改运行脚本时得到的结果:

enter image description here

惊喜扭曲!

所以事实证明你的直方图一直都按照正式身份进行了标准化:

enter image description here

在普通(呃)英语中,一般的做法是根据其密度来规范连续有价值的直方图(即它们的观察结果可以表示为浮点数)。因此,在这种情况下,bin宽度乘以bin高度的总和将为1.0,正如您可以通过运行此脚本的简化版本看到的那样:

import matplotlib.pyplot as plt
import matplotlib.ticker as tck
import numpy as np

imagen2 = plt.figure(1, figsize=(4,3))
imagen2.suptitle('StackOverflow Matplotlib histogram demo')

luminance = numpy.random.randn(1000, 1000)
luminance = (luminance - luminance.min())/(luminance.max() - luminance.min())

heights,edges,patches = plt.hist(luminance.ravel(), density=True, bins=30)
widths = edges[1:] - edges[:-1]

totalWeight = (heights*widths).sum()

# plt.savefig("stackoverflow.pdf", dpi=300)
plt.tight_layout(rect=(0, 0, 1, 0.95))
plt.show()
print(totalWeight)

totalWeight确实与1.0完全相同,给出或取一些舍入错误。

答案 1 :(得分:0)

tel's answer is great!我只是想提供一种替代方案,以较少的线条为您提供所需的直方图。关键思想是在matplotlib weights函数中使用hist参数来规范化计数。您可以使用以下三行代码替换sns.distplot(luminance.flatten(), kde_kws={"cumulative": True})

lf = luminance.flatten()
sns.kdeplot(lf, cumulative=True)
sns.distplot(lf, kde=False,
             hist_kws={'weights': numpy.full(len(lf), 1/len(lf))})

enter image description here

如果您想在第二个y轴上看到直方图(更好的视觉效果),请将ax=bottom_left.twinx()添加到sns.distplot

enter image description here