将数组列映射到新间隔的最快方法

时间:2017-09-18 09:25:28

标签: python arrays numpy matrix transform

我有以下矩阵,代表了一些要点:

points = np.random.uniform(30, 50, size = (5,3))
# gives array([[ 45.98139489,  40.27871523,  41.91617071],
               [ 41.1404787 ,  34.56098247,  35.91171313],
               [ 34.46375465,  49.89872417,  39.04753134],
               [ 49.28112722,  32.01837698,  32.83394596],
               [ 48.96623168,  33.58271833,  33.54690091]])

现在每列都是一个坐标。每列的值都在[30,50]范围内。我想将每列映射到不同的间隔。由于这个问题,我知道如何将点从一个区间映射到另一个区间: Algorithm to map an interval to a smaller interval

但是我想要快速制作一些东西并将每列(可能)映射到不同的间隔。例如,假设我们有

intervals = np.array([[0, 10], [3,7], [100,200]])

或者我们可以将它们作为xinterval = np.array([0,10])在数组中分开,但这并不重要。

我的慢尝试

我收集了intervals中的所有间隔,然后通过循环使用每列的转换

for col, interval in zip(range(points.shape[1]), intervals):
       points[:, col] = ((points[:,col]-min(points[:,col]))*(interval[1]-interval[0]) / (max(points[:,col])-min(points[:,col])) ) + interval[0]

为简单起见,我使用了min max范围作为上一个时间间隔,但我本来可以使用30,50

for col, interval in zip(range(points.shape[1]), intervals):
       points[:, col] = ((points[:,col]-30)*(interval[1]-interval[0]) / (50-30) ) + interval[0]

有没有更快的方法,没有使用循环?

1 个答案:

答案 0 :(得分:1)

直接广播

这是一种利用broadcasting -

的矢量化方式
mins = points.min(0)
a1 = (points - mins)* (intervals[:,1]-intervals[:,0])
a2 = points.max(0) - mins
out = a1/a2 + intervals[:,0]

改进:广播

仔细观察,我们在少数地方正在执行broadacsting。虽然broadacsting是一种非常有效的向量化方法,但它仍然有一些成本。我们可以通过重新安排周围的事情来改进它,目的是将broadcasting步数减少到只有两个,而不是之前的四个。

因此,修改过的将是 -

mins = points.min(0)
scale = (intervals[:,1]-intervals[:,0])/(points.max(0) - mins)
offset = mins*scale - intervals[:,0]
out = points *scale - offset

予。之前的广播步骤:

两个人:(points - mins)* (intervals[:,1]-intervals[:,0])

两个人:a1/a2 + intervals[:,0]

II。改进后的广播步骤:

一个在points *scale,一个在减法后。

运行时测试

方法 -

def app1(points, intervals):
    mins = points.min(0)
    a1 = (points - mins)* (intervals[:,1]-intervals[:,0])
    a2 = points.max(0) - mins
    out = a1/a2 + intervals[:,0]
    return out

def app2(points, intervals):
    mins = points.min(0)
    scale = (intervals[:,1]-intervals[:,0])/(points.max(0) - mins)
    offset = mins*scale - intervals[:,0]
    out = points *scale - offset
    return out

计时 -

In [104]: points = np.array([[ 45.98139489,  40.27871523,  41.91617071],
     ...:                [ 41.1404787 ,  34.56098247,  35.91171313],
     ...:                [ 34.46375465,  49.89872417,  39.04753134],
     ...:                [ 49.28112722,  32.01837698,  32.83394596],
     ...:                [ 48.96623168,  33.58271833,  33.54690091]])
     ...: points = np.repeat(points, 100000,axis=0)
     ...: 
     ...: intervals = np.array([[0, 10], [3,7], [100,200]])
     ...: 

In [105]: %timeit app1(points, intervals)
10 loops, best of 3: 26.3 ms per loop

In [106]: %timeit app2(points, intervals)
100 loops, best of 3: 17.9 ms per loop