Question

通读matplotlib plt.hist文档，有一个可以设置为true的density参数。文档说

density : bool, optional
            If ``True``, the first element of the return tuple will
            be the counts normalized to form a probability density, i.e.,
            the area (or integral) under the histogram will sum to 1.
            This is achieved by dividing the count by the number of
            observations times the bin width and not dividing by the total
            number of observations. If *stacked* is also ``True``, the sum of
            the histograms is normalized to 1.

第This is achieved by dividing the count by the number of observations times the bin width and not dividing by the total number of observations行

我尝试用示例数据复制它。

**Using matplotlib inbuilt calculations** .

ser = pd.Series(np.random.normal(size=1000))
ser.hist(density = 1,  bins=100)

**Manual calculation of the density** : 

arr_hist , edges = np.histogram( ser, bins =100)
samp = arr_hist / ser.shape[0] * np.diff(edges)
plt.bar(edges[0:-1] , samp )
plt.grid()

这两个图在y轴比例上完全不同，有人可以指出究竟出了什么问题以及如何手动复制密度计算吗？

Answer 1

这是语言上的歧义。句子

This is achieved by dividing the count by the number of observations times the bin width

需要像这样阅读

This is achieved by dividing (the count) by (the number of observations times the bin width)

即

count / (number of observations * bin width)

完整代码：

import numpy as np
import matplotlib.pyplot as plt

arr = np.random.normal(size=1000)

fig, (ax1, ax2) = plt.subplots(2)
ax1.hist(arr, density = True,  bins=100)
ax1.grid()


arr_hist , edges = np.histogram(arr, bins =100)
samp = arr_hist / (arr.shape[0] * np.diff(edges))
ax2.bar(edges[0:-1] , samp, width=np.diff(edges) )
ax2.grid()

plt.show()

matplotlib如何计算直方图的密度

1 个答案: