Python:更快的内核评估功能

时间:2014-07-23 10:36:43

标签: python numpy scipy scikit-learn

我有一个类似下面的函数来评估实例x和y之间的内核:

def my_hik(x, y):
     """Histogram-Intersection-Kernel """
     summe = 0
     for i in xrange(len(x)):
         summe += min(x[i],y[i])
     return summe
     #return np.sum(np.min(np.array([[x],[y]]),0))

metrics.pairwise.pairwise_kernels(instances, metric=my_hik, n_jobs=-1)

我用sklearns pairwise_kernels-function来称呼它。但是我的数据(大约3000个具有一百个属性的实例)似乎太大了,一个矩阵的计算需要几分钟(因为该函数被称为9 * 10 ^ 6倍)。有没有办法让函数运行得更快?

1 个答案:

答案 0 :(得分:5)

def fast_hik(x, y):
    return np.minimum(x, y).sum()

时序:

>>> x = np.random.randn(100)
>>> y = np.random.randn(100)
>>> %timeit my_hik(x, y)
10000 loops, best of 3: 50.3 µs per loop
>>> %timeit fast_hik(x, y)
100000 loops, best of 3: 5.55 µs per loop

更长的矢量可以获得更快的加速:

>>> x = np.random.randn(1000)
>>> y = np.random.randn(1000)
>>> %timeit my_hik(x, y)
1000 loops, best of 3: 498 µs per loop
>>> %timeit fast_hik(x, y)
100000 loops, best of 3: 7.92 µs per loop