Question

我有时需要使用matplotlib对离散值进行直方图化。在这种情况下，分箱的选择可能是至关重要的：如果您使用10个箱子直方图[0,1,2,3,4,5,6,7,8,9,10]，其中一个箱子将有两次和其他人一样多。换句话说，binsize通常应该是离散化大小的倍数。

虽然这个简单的案例本身相对容易处理，但是否有人有一个指向库/函数的指针可以自动处理这个，包括浮点数据的情况，其中离散化大小可能会略有变化由于FP舍入？

感谢。

Answer 1

答案也许不及J Richard Snape's来得完整，但我最近学到了一个答案，发现它很直观，简单。

import numpy as np
import matplotlib.pyplot as plt

# great seed
np.random.seed(1337)

# how many times will a fair die land on the same number out of 100 trials.
data = np.random.binomial(n=100, p=1/6, size=1000)

# the trick is to set up the bins centered on the integers, i.e.
# -0.5, 0.5, 1,5, 2.5, ... up to max(data) + 1.5. Then you substract -0.5 to
# eliminate the extra bin at the end.
bins = np.arange(0, data.max() + 1.5) - 0.5

# then you plot away
fig, ax = plt.subplots()
_ = ax.hist(data, bins)
ax.set_xticks(bins + 0.5)

事实证明，大约16/100的投掷次数是相同的！

Answer 2

另一个版本，只需少量代码即可处理简单的情况！这次使用numpy.unique和matplotlib.vlines：

import numpy as np
import matplotlib.pyplot as plt

# same seed/data as Manuel Martinez to make plot easy to compare
np.random.seed(1337)
data = np.random.binomial(100, 1/6, 1000)

values, counts = np.unique(data, return_counts=True)

plt.vlines(values, 0, counts, color='C0', lw=4)

# optionally set y-axis up nicely
plt.ylim(0, max(counts) * 1.06)

给我：

看起来非常可读

Answer 3

不完全是 OP 所要求的，但如果所有值都是整数，则不需要计算 bin。

np.unique(d, return_counts=True) 返回一个包含唯一值列表的元组作为第一个元素，并将它们的计数作为第二个元素。这可以使用星号运算符直接插入到 plt.bar(x, height) 中：

import numpy as np
import matplotlib.pyplot as plt

d = [1,1,2,4,4,4,5,6]
plt.bar(*np.unique(d, return_counts=True))

结果如下图：

请注意，这在技术上也适用于浮点数，但结果可能出乎意料，因为为每个数字都创建了一个条形。

使用matplotlib的离散值的直方图

3 个答案: