如何在numpy中优化此函数的计算?

时间:2017-04-22 17:32:01

标签: python python-2.7 numpy machine-learning kernel-density

我想在numpy中实现以下问题,这是我的代码。

我用一个for循环尝试了以下numpy代码来解决这个问题。我想知道是否有更有效的方法进行此计算?我真的很感激!

coverage_table

我通过创建3D张量来考虑var list = new List<ApplicationUserListViewModel>(); foreach (var user in _userManager.Users.ToList()) { list.Add(new ApplicationUserListViewModel() { UserEmail = user.Email, Roles = await _userManager.GetRolesAsync(user) }); } 因此可以通过广播进行以下计算,但是当k, d = X.shape m = Y.shape[0] c1 = 2.0*sigma**2 c2 = 0.5*np.log(np.pi*c1) c3 = np.log(1.0/k) L_B = np.zeros((m,)) for i in xrange(m): if i % 100 == 0: print i L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide( np.power(X-Y[i,:],2), c1)-c2,1)+c3))) print np.mean(L_B) 很大时会浪费大量内存。

我也相信np.expand_dims(X, 2).repeat(Y.shape[0], 2)-Y只使用for循环,所以可能效率不高,如果我错了,请纠正我。

有什么想法吗?

1 个答案:

答案 0 :(得分:5)

优化阶段#1

我的第一级优化使用循环代码直接转换为基于broadcasting的基于引入新轴的方法,因此不具有内存效率,如下所示 -

p1 = (-((X[:,None] - Y)**2)/c1)-c2
p11 = p1.sum(2)
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()

优化阶段#2

在记住我们打算将常量操作分开的情况下进行一些优化,我最终得到了以下内容 -

c10 = -c1
c20 = X.shape[1]*c2

subs = (X[:,None] - Y)**2
p00 = subs.sum(2)
p10 = p00/c10
p11 = p10-c20
p2 = np.exp(p11+c3)
out = np.log(p2.sum(0)).mean()

优化阶段#3

进一步了解并查看可以优化操作的地方,我最终使用Scipy's cdist替换了平方和sum-reduction的重量级工作。这应该是相当高的内存效率,并给我们最终的实现,如下所示 -

from scipy.spatial.distance import cdist

# Setup constants
c10 = -c1
c20 = X.shape[1]*c2
c30 = c20-c3
c40 = np.exp(c30)
c50 = np.log(c40)

# Get stagewise operations corresponding to loopy ones
p1 = cdist(X, Y, 'sqeuclidean')
p2 = np.exp(p1/c10).sum(0)
out = np.log(p2).mean() - c50

运行时测试

方法 -

def loopy_app(X, Y, sigma):
    k, d = X.shape
    m = Y.shape[0]

    c1 = 2.0*sigma**2
    c2 = 0.5*np.log(np.pi*c1)
    c3 = np.log(1.0/k)

    L_B = np.zeros((m,))
    for i in xrange(m):
        L_B[i] = np.log(np.sum(np.exp(np.sum(-np.divide(
                    np.power(X-Y[i,:],2), c1)-c2,1)+c3)))

    return np.mean(L_B)

def vectorized_app(X, Y, sigma):
    # Setup constants
    k, d = D_A.shape
    c1 = 2.0*sigma**2
    c2 = 0.5*np.log(np.pi*c1)
    c3 = np.log(1.0/k)

    c10 = -c1
    c20 = X.shape[1]*c2
    c30 = c20-c3
    c40 = np.exp(c30)
    c50 = np.log(c40)

    # Get stagewise operations corresponding to loopy ones
    p1 = cdist(X, Y, 'sqeuclidean')
    p2 = np.exp(p1/c10).sum(0)
    out = np.log(p2).mean() - c50
    return out

计时和验证 -

In [294]: # Setup inputs with m(=D_B.shape[0]) being a large number
     ...: X = np.random.randint(0,9,(100,10))
     ...: Y = np.random.randint(0,9,(10000,10))
     ...: sigma = 2.34
     ...: 

In [295]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[295]: True

In [296]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 225 ms per loop

In [297]: %timeit vectorized_app(X, Y, sigma)
10 loops, best of 3: 23.6 ms per loop

In [298]: # Setup inputs with m(=Y.shape[0]) being a much large number
     ...: X = np.random.randint(0,9,(100,10))
     ...: Y = np.random.randint(0,9,(100000,10))
     ...: sigma = 2.34
     ...: 

In [299]: np.allclose(loopy_app(X, Y, sigma),vectorized_app(X, Y, sigma))
Out[299]: True

In [300]: %timeit loopy_app(X, Y, sigma)
1 loops, best of 3: 2.27 s per loop

In [301]: %timeit vectorized_app(X, Y, sigma)
1 loops, best of 3: 243 ms per loop

围绕 10x 加速!