我正在尝试使用target='cuda'
运行以下Numba-nopython兼容函数:
@numba.jit(nopython = True)
def hermite_polynomials(X, N):
r'''
Evaluate the orthonormal Hermite polynomials on
:math:`(\mathbb{R},\frac{1}{\sqrt{2\pi}}\exp(-x^2/2)dx)` in :math:`X\subset\mathbb{R}`
:param X: Locations of desired evaluations
:type X: One dimensional np.array
:param N: Number of polynomials
:rtype: numpy.array of shape :code:`X.shape[0] x N`
'''
out = np.zeros((X.shape[0], N))
deg = N - 1
factorial = np.ones((1,N))
for i in range(1,N):
factorial[0,i:]*=i
orthonormalizer = 1 / np.sqrt(factorial)
if deg < 1:
out = np.ones((X.shape[0], 1))
else:
out[:, 0] = np.ones((X.shape[0],))
out[:, 1] = X
for n in range(1, deg):
out[:, n + 1] = X * out[:, n] - n * out[:, n - 1]
return out * orthonormalizer
但是,我没有找到任何既容易理解的示例代码(只有Python和MATLAB经验,没有计算机科学家),并且很难实际有用(我只发现a+b
种类例子)。
到目前为止,我到达了以下函数,需要传递一个数组(我自己无法定义数组,cuda.local.array((N,1),dtype=float64)
导致ConstantInferenceError
)。我接受了我必须按顺序进行乘法运算,因此额外的for循环,但甚至不起作用,因为我得到Invalid usage of * with parameters (array(float64, 1d, C), float64)
错误。
@numba.jit(target = 'cuda')
def hermite_polynomials2(X, N,out):
r'''
Evaluate the orthonormal Hermite polynomials on
:math:`(\mathbb{R},\frac{1}{\sqrt{2\pi}}\exp(-x^2/2)dx)` in :math:`X\subset\mathbb{R}`
:param X: Locations of desired evaluations
:type X: One dimensional np.array
:param N: Number of polynomials
:rtype: numpy.array of shape :code:`X.shape[0] x N`
'''
deg = N-1
L = X.shape[0]
if deg == 0:
return
else:
out[:, 1] = X
for n in range(1, deg):
for j in range(L):
out[j, n + 1] = X * out[j, n] - n * out[j, n - 1]
factorial = 1
for i in range(1,N):
factorial *= i
for j in range(L):
out[j,i] /= np.sqrt(factorial)
return
如何进行乘法?
答案 0 :(得分:1)
您可能需要以下内容:
Password
但请注意,编写此内核的整个练习大多是徒劳的。引自相关的documentation:
为了获得最佳性能,用户应编写每个线程都有的代码 一次只处理一个元素。
您编写的内核将完全串行。它会比CPU版本慢。您需要以非常不同的方式编写代码,以使其在GPU上具有任何价值。