So I have X
and Y
which are binary vectors of the two elements 0,1, and are about 30000 elements long. What I want to do exactly is find a fast way to calculate np.inner(X, Y)/np.sum(Y)
. I don't think much can be done about the np.sum(Y)
part, which basically just counts how many 1s there are in Y
. The np.inner(X,Y)
part actually counts the number of 1s in X
whose corresponding element (i.e. same index) in Y
is also 1, i.e. it works out the same as
re = 0
for i in range(X.shape[0]):
if X[i]==1 and Y[i]==1:
re += 1
return re
I think np.inner
is the most straightforward way and also the fastest way I can think about within numpy
. However the performance is still not satisfactory: this calculation makes up most of the computational cost of a function that has to be iterated hundreds of times, and the total runtime turns out as long as 90s. So I've been thinking quite hard about optimising it. I tried using X @ Y.T
, which didn't make much difference unsurprisingly. I also checked that my numpy
package is linked to all the common optimising packages like blas
or mkl
etc (the only one unavailalbe is openblas
). Anyway is there any possible room for speeding up this calculation?