快速将数组分配给长度相等的n个bin

时间:2017-07-21 15:12:52

标签: python numpy binning discretization

例如,我有一个数组流,其数字范围从0.010.0

我想快速将arr中的数字分配给相同长度的5二进制数。

等长我的意思是bin间隔是[0.0, 2.0), [2.0, 4.0), [4.0, 6.0), [6.0, 8.0), [8.0, 10.0]

问题是最后一个间隔与其他间隔不同。

测试:

import numpy as np
# Things we know and can pre-calculate
n_bins = 5
minimal = 0.0  
maximal = 10.0
reciprocal_bin_length = n_bins / (maximal - minimal)

# Let's say the stream gives 1001 numbers every time.
data = np.arange(1001)/100

norm_data = (data - minimal) * reciprocal_bin_length
norm_data = norm_data.astype(int)
print(norm_data.max())
print(norm_data.min())

结果:

5
0

bin索引应该是0,1,2,3或4,但不是5。

2 个答案:

答案 0 :(得分:3)

A" 穷人的解决方案"可以计算数组norm_datanbins-1之间的minimum

norm_data = np.minimum(norm_data,nbins-1)

因此所有5 s(及以上)都将转换为4 s。请注意,当然在这里你不会进行适当的范围检查(120.0也会在第4栏中结束)。

答案 1 :(得分:0)

如果0.1%的误差可以接受,则以下内容会快一些。 不确定浮点四舍五入是否合适。

import numpy as np
# Things we know and can pre-calculate
n_bins = 5
minimal = 0.0  
maximal = 10.0
approx = 1.001  # <-- this is new
reciprocal_bin_length = n_bins / (maximal*approx - minimal)

# Let's say the stream gives 1001 numbers every time.
data = np.arange(1001)/100

# can use numexpr for speed.
norm_data = (data - minimal) * reciprocal_bin_length
norm_data = norm_data.astype(int)
print(norm_data.max())
print(norm_data.min())