我有一大堆浮点数。我将数组乘以标量。朱莉娅最好(最快)的方式是什么?
decay1
decay2
是向量化实现,decay3
是devectorized,Vectorized:
2.291 ms (4 allocations: 112 bytes)
Devectorized:
2.221 ms (0 allocations: 0 bytes)
Multithreaded:
1.963 ms (1 allocation: 32 bytes)
Multithreaded array muliplication:
87.418 ms (2 allocations: 30.52 MiB)
Scale:
2.042 ms (0 allocations: 0 bytes)
是多线程的,有4个线程。
我看到了以下时间。
K = 10
myarray1 = tf.placeholder(tf.float32, shape=[None,5,5]) # shape = [None, 5, 5]
myarray2 = tf.Variable( np.zeros([K,5,5]), dtype=tf.float32 )
vals = []
for k in range(0,K):
tmp = tf.reduce_sum(myarray1*myarray2[k],axis=(1,2))
vals.append(tmp)
result = tf.min( tf.stack(vals,axis=-1), axis=-1 )
加速量显然太低了。我究竟做错了什么?我怎样才能做得更好?