数字化numpy数组

时间:2015-08-05 22:49:00

标签: python numpy

我有两个向量:

  time_vec = np.array([0.2,0.23,0.3,0.4,0.5,...., 28....])
  values_vec = np.array([500,200,220,250,200,...., 218....])
  time_vec.shape == values_vec.shape 

现在,我想每隔0.5秒间隔取bin值并取值的平均值。例如,

  value_vec = np.array(mean_of(500,200,220,250,200), mean_of(next values in next 0.5 second interval))

是否有任何numpy方法我错过哪个bin并取出箱子的意思?

3 个答案:

答案 0 :(得分:3)

您可以使用np.ufunc.reduceat。您只需要填充断点所在的位置,即floor(t / .5)更改时的位置:

说:

>>> t
array([ 0.    ,  0.025 ,  0.2125,  0.2375,  0.2625,  0.3375,  0.475 ,  0.6875,  0.7   ,  0.7375,  0.8   ,  0.9   ,
        0.925 ,  1.05  ,  1.1375,  1.15  ,  1.1625,  1.1875,  1.1875,  1.225 ])
>>> b
array([ 0.8144,  0.3734,  1.4734,  0.6307, -0.611 , -0.8762,  1.6064,  0.3863, -0.0103, -1.6889, -0.4328, -0.7373,
        1.7856,  0.8938, -1.1574, -0.4029, -0.4352, -0.4412, -1.7819, -0.3298])

断点是:

>>> i = np.r_[0, 1 + np.nonzero(np.diff(np.floor(t / .5)))[0]]
>>> i
array([ 0,  7, 13])

并且每个区间的总和是:

>>> np.add.reduceat(b, i)
array([ 3.411 , -0.6975, -3.6545])

并且平均值将是间隔长度的总和:

>>> np.add.reduceat(b, i) / np.diff(np.r_[i, len(b)])
array([ 0.4873, -0.1162, -0.5221])

答案 1 :(得分:2)

您可以将weights=参数传递给np.histogram以计算每个时间段内的总和值,然后按箱数进行标准化:

# 0.5 second time bins to average within
tmin = time_vec.min()
tmax = time_vec.max()
bins = np.arange(tmin - (tmin % 0.5), tmax - (tmax % 0.5) + 0.5,  0.5)

# summed values within each bin
bin_sums, edges = np.histogram(time_vec,bins=bins, weights=values_vec)

# number of values within each bin
bin_counts, edges = np.histogram(time_vec,bins=bins)

# average value within each bin
bin_means = bin_sums / bin_counts

答案 2 :(得分:0)

您可以使用np.bincount,这对于此类分箱操作非常有效。这是基于它的实现来解决我们的案例 -

# Find indices where 0.5 intervals shifts onto next ones
A = time_vec*2
idx = np.searchsorted(A,np.arange(1,int(np.ceil(A.max()))),'right')

# Setup ID array such that all 0.5 intervals are ID-ed same
out = np.zeros((A.size),dtype=int)
out[idx[idx < A.size]] = 1
ID = out.cumsum()

# Finally use bincount to sum and count elements of same IDs
# and thus get mean values per ID
mean_vec = np.bincount(ID,values_vec)/np.bincount(ID)

示例运行 -

In [189]: time_vec
Out[189]: 
array([ 0.2 ,  0.23,  0.3 ,  0.4 ,  0.5 ,  0.7 ,  0.8 ,  0.92,  0.95,
        1.  ,  1.11,  1.5 ,  2.  ,  2.3 ,  2.5 ,  4.5 ])

In [190]: values_vec
Out[190]: array([36, 11, 93, 32, 72, 75, 26, 28, 77, 31, 60, 77, 76, 32,  6, 85])

In [191]: ID
Out[191]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 4, 4, 5], dtype=int32)

In [192]: mean_vec
Out[192]: array([ 48.8,  47.4,  68.5,  76. ,  19. ,  85. ])