通读matplotlib plt.hist文档,有一个可以设置为true的density参数。文档说
density : bool, optional
If ``True``, the first element of the return tuple will
be the counts normalized to form a probability density, i.e.,
the area (or integral) under the histogram will sum to 1.
This is achieved by dividing the count by the number of
observations times the bin width and not dividing by the total
number of observations. If *stacked* is also ``True``, the sum of
the histograms is normalized to 1.
第This is achieved by dividing the count by the number of observations times the bin width and not dividing by the total number of observations
行
我尝试用示例数据复制它。
**Using matplotlib inbuilt calculations** .
ser = pd.Series(np.random.normal(size=1000))
ser.hist(density = 1, bins=100)
**Manual calculation of the density** :
arr_hist , edges = np.histogram( ser, bins =100)
samp = arr_hist / ser.shape[0] * np.diff(edges)
plt.bar(edges[0:-1] , samp )
plt.grid()
这两个图在y轴比例上完全不同,有人可以指出究竟出了什么问题以及如何手动复制密度计算吗?
答案 0 :(得分:2)
这是语言上的歧义。句子
This is achieved by dividing the count by the number of observations times the bin width
需要像这样阅读
This is achieved by dividing (the count) by (the number of observations times the bin width)
即
count / (number of observations * bin width)
完整代码:
import numpy as np
import matplotlib.pyplot as plt
arr = np.random.normal(size=1000)
fig, (ax1, ax2) = plt.subplots(2)
ax1.hist(arr, density = True, bins=100)
ax1.grid()
arr_hist , edges = np.histogram(arr, bins =100)
samp = arr_hist / (arr.shape[0] * np.diff(edges))
ax2.bar(edges[0:-1] , samp, width=np.diff(edges) )
ax2.grid()
plt.show()