Question

我想填写出勤的虚拟数据。我希望如此，例如，60％的学生在40-60范围内的出勤率为70-100％，在0-40岁的范围内为15％。如何在Python中使用随机数生成此项。这有什么内置功能吗？我知道numpy.random.choice允许预定义离散数的概率，但有没有办法指定二进制数/范围的概率？

Answer 1

如果你知道学生数N，你可以

N_ha = int(N * 0.6)  # students with high attendance
N_la = int(N * 0.15)  # students with low attendance
N_aa = N - ha - la  # students with average attendance

att_ha = np.random.random(N_ha) * 0.3 + 0.7  # this creates N_ha attendances in the half-open range [0.7, 1)
att_la = np.random.random(N_la) * 0.4
att_aa = np.random.random(N_aa) * 0.2 + 0.4  # sure you didn't mean between 40 and 70? in that case, substitute 0.2 with 0.3

attendances = x = np.append(att_ha, np.append(att_la, att_aa))
np.random.shuffle(attendances)

希望这有帮助！

Answer 2

你可以这样使用np.interp：

>>> ranges = [0, 0.4, 0.7, 1.0]
>>> probs = [0.15, 0.25, 0.6]
>>>
# translate to cumulative probabilities
>>> P = np.r_[0, np.cumsum(probs)]
>>> 
# draw and transform
>>> samples = np.interp(np.random.random((1_000_000,)), P, ranges)
>>>
# check 
>>> np.count_nonzero(samples < 0.4)
149477
>>> np.count_nonzero(samples > 0.7)
600394
>>> np.count_nonzero((samples < 0.7) & (samples > 0.4))
250129

子群体将在其范围内均匀分布。

np.interp创建了分段线性函数。像我们这样使用时，它将统一[0, 1]个分布式样本分组到0-15%，15-40%和40-100%组，并将它们重新分为0-40%，40-70%和70-100%。

如果我有python

2 个答案: