我想用Cython优化此Python代码:
def updated_centers(point, start, center):
return np.array([__cluster_mean(point[start[c]:start[c + 1]], center[c]) for c in range(center.shape[0])])
def __cluster_mean(point, center):
return (np.sum(point, axis=0) + center) / (point.shape[0] + 1)
我的Cython代码:
cimport cython
cimport numpy as np
import numpy as np
# C-compatible Numpy integer type.
DTYPE = np.intc
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
@cython.cdivision(True) # Deactivate division by 0 checking.
def updated_centers(double [:,:] point, int [:] label, double [:,:] center):
if (point.shape[0] != label.size) or (point.shape[1] != center.shape[1]) or (center.shape[0] > point.shape[0]):
raise ValueError("Incompatible dimensions")
cdef Py_ssize_t i, c, j
cdef Py_ssize_t n = point.shape[0]
cdef Py_ssize_t m = point.shape[1]
cdef Py_ssize_t nc = center.shape[0]
# Updated centers. We accumulate point and center contributions into this array.
# Start by adding the (unscaled) center contributions.
new_center = np.zeros([nc, m])
new_center[:] = center
# Counter array. Will contain cluster sizes (including center, whose contribution
# is again added here) at the end of the point loop.
cluster_size = np.ones([nc], dtype=DTYPE)
# Add point contributions.
for i in range(n):
c = label[i]
cluster_size[c] += 1
for j in range(m):
new_center[c, j] += point[i, j]
# Scale center+point summation to be a mean.
for c in range(nc):
for j in range(m):
new_center[c, j] /= cluster_size[c]
return new_center
但是,Cython的速度比python慢:
Python: %timeit f.updated_centers(point, start, center)
331 ms ± 11.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Cython: %timeit fx.updated_centers(point, label, center)
433 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
HTML显示几乎所有行都是黄色的:分配数组+ =,/ =。我期望Cython快一个数量级。我在做什么错了?
答案 0 :(得分:0)
关键是编写类似于Python代码的Cython代码,仅在必要时访问数组。
cimport cython
cimport numpy as np
import numpy as np
# C-compatible Numpy integer type.
DTYPE = np.intc
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
@cython.cdivision(True) # Deactivate division by 0 checking.
def updated_centers(double [:, :] point, int [:] start, double [:, :] center):
"""Returns the updated list of cluster centers (damped center of mass Pahkira scheme). Cluster c
(and center[c]) corresponds to the point range point[start[c]:start[c+1]]."""
if (point.shape[1] != center.shape[1]) or (center.shape[0] > point.shape[0]) or (start.size != center.shape[0] + 1):
raise ValueError("Incompatible dimensions")
# Py_ssize_t is the proper C type for Python array indices.
cdef Py_ssize_t i, c, j, cluster_start, cluster_stop, cluster_size
cdef Py_ssize_t n = point.shape[0]
cdef Py_ssize_t m = point.shape[1]
cdef Py_ssize_t nc = center.shape[0]
cdef double center_of_mass
# Updated centers. We accumulate point and center contributions into this array.
# Start by adding the (unscaled) center contributions.
new_center = np.zeros([nc, m])
cluster_start = start[0]
for c in range(nc):
cluster_stop = start[c + 1]
cluster_size = cluster_stop - cluster_start + 1
for j in range(m):
center_of_mass = center[c, j]
for i in range(cluster_start, cluster_stop):
center_of_mass += point[i, j]
new_center[c, j] = center_of_mass / cluster_size
cluster_start = cluster_stop
return np.asarray(new_center)
我们使用相同的API
n, m = 100000, 5; k = n//2; point = np.random.rand(n, m); start = 2*np.arange(k+1, dtype=np.intc); center=np.random.rand(k, m);
%timeit fx.updated_centers(point, start, center)
31 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit f.updated_centers(point, start, center)
734 ms ± 17.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
答案 1 :(得分:0)
您需要告诉Cython new_center
和cluster_size
是数组:
cdef double[:, :] new_center = np.zeros((nc, m))
...
cdef int[:] cluster_size = np.ones((nc,), dtype=DTYPE)
...
没有这些类型注释,Cython无法生成有效的C代码,并且在访问这些数组时必须调用Python解释器。这就是为什么访问这些数组的cython -a
的HTML输出中的行为黄色的原因
仅通过这两个小修改,我们立即看到我们想要的加速:
%timeit python_updated_centers(point, start, center)
392 ms ± 41.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit cython_updated_centers(point, start, center)
1.18 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
答案 2 :(得分:0)
对于这样简单的内核,您还可以使用pythran获得不错的加速比:
#pythran export updated_centers(float64 [:, :], int32 [:] , float64 [:, :] )
import numpy as np
def updated_centers(point, start, center):
return np.array([__cluster_mean(point[start[c]:start[c + 1]], center[c]) for c in range(center.shape[0])])
def __cluster_mean(point, center):
return (np.sum(point, axis=0) + center) / (point.shape[0] + 1)
使用pythran updated_centers.py
进行编译并获得以下计时:
Numpy代码(相同的代码,未编译):
$ python -m perf timeit -s 'import numpy as np; n, m = 100000, 5; k = n//2; point = np.random.rand(n, m); start = 2*np.arange(k+1, dtype=np.int32); center=np.random.rand(k, m); from updated_centers import updated_centers' 'updated_centers(point, start, center)'
.....................
Mean +- std dev: 271 ms +- 12 ms
Pythran(编译后):
$ python -m perf timeit -s 'import numpy as np; n, m = 100000, 5; k = n//2; point = np.random.rand(n, m); start = 2*np.arange(k+1, dtype=np.int32); center=np.random.rand(k, m); from updated_centers import updated_centers' 'updated_centers(point, start, center)'
.....................
Mean +- std dev: 12.8 ms +- 0.3 ms