我有以下矩阵,代表了一些要点:
points = np.random.uniform(30, 50, size = (5,3))
# gives array([[ 45.98139489, 40.27871523, 41.91617071],
[ 41.1404787 , 34.56098247, 35.91171313],
[ 34.46375465, 49.89872417, 39.04753134],
[ 49.28112722, 32.01837698, 32.83394596],
[ 48.96623168, 33.58271833, 33.54690091]])
现在每列都是一个坐标。每列的值都在[30,50]
范围内。我想将每列映射到不同的间隔。由于这个问题,我知道如何将点从一个区间映射到另一个区间:
Algorithm to map an interval to a smaller interval
但是我想要快速制作一些东西并将每列(可能)映射到不同的间隔。例如,假设我们有
intervals = np.array([[0, 10], [3,7], [100,200]])
或者我们可以将它们作为xinterval = np.array([0,10])
在数组中分开,但这并不重要。
我的慢尝试
我收集了intervals
中的所有间隔,然后通过循环使用每列的转换
for col, interval in zip(range(points.shape[1]), intervals):
points[:, col] = ((points[:,col]-min(points[:,col]))*(interval[1]-interval[0]) / (max(points[:,col])-min(points[:,col])) ) + interval[0]
为简单起见,我使用了min
max
范围作为上一个时间间隔,但我本来可以使用30,50
:
for col, interval in zip(range(points.shape[1]), intervals):
points[:, col] = ((points[:,col]-30)*(interval[1]-interval[0]) / (50-30) ) + interval[0]
有没有更快的方法,没有使用循环?
答案 0 :(得分:1)
直接广播
这是一种利用broadcasting
-
mins = points.min(0)
a1 = (points - mins)* (intervals[:,1]-intervals[:,0])
a2 = points.max(0) - mins
out = a1/a2 + intervals[:,0]
改进:次广播
仔细观察,我们在少数地方正在执行broadacsting
。虽然broadacsting
是一种非常有效的向量化方法,但它仍然有一些成本。我们可以通过重新安排周围的事情来改进它,目的是将broadcasting
步数减少到只有两个,而不是之前的四个。
因此,修改过的将是 -
mins = points.min(0)
scale = (intervals[:,1]-intervals[:,0])/(points.max(0) - mins)
offset = mins*scale - intervals[:,0]
out = points *scale - offset
予。之前的广播步骤:
两个人:(points - mins)* (intervals[:,1]-intervals[:,0])
。
两个人:a1/a2 + intervals[:,0]
。
II。改进后的广播步骤:
一个在points *scale
,一个在减法后。
运行时测试
方法 -
def app1(points, intervals):
mins = points.min(0)
a1 = (points - mins)* (intervals[:,1]-intervals[:,0])
a2 = points.max(0) - mins
out = a1/a2 + intervals[:,0]
return out
def app2(points, intervals):
mins = points.min(0)
scale = (intervals[:,1]-intervals[:,0])/(points.max(0) - mins)
offset = mins*scale - intervals[:,0]
out = points *scale - offset
return out
计时 -
In [104]: points = np.array([[ 45.98139489, 40.27871523, 41.91617071],
...: [ 41.1404787 , 34.56098247, 35.91171313],
...: [ 34.46375465, 49.89872417, 39.04753134],
...: [ 49.28112722, 32.01837698, 32.83394596],
...: [ 48.96623168, 33.58271833, 33.54690091]])
...: points = np.repeat(points, 100000,axis=0)
...:
...: intervals = np.array([[0, 10], [3,7], [100,200]])
...:
In [105]: %timeit app1(points, intervals)
10 loops, best of 3: 26.3 ms per loop
In [106]: %timeit app2(points, intervals)
100 loops, best of 3: 17.9 ms per loop