Question

我正在尝试使用target='cuda'运行以下Numba-nopython兼容函数：

    @numba.jit(nopython = True)
    def hermite_polynomials(X, N):
    r'''
    Evaluate the orthonormal Hermite polynomials on 
    :math:`(\mathbb{R},\frac{1}{\sqrt{2\pi}}\exp(-x^2/2)dx)` in :math:`X\subset\mathbb{R}`


    :param X: Locations of desired evaluations
    :type X:  One dimensional np.array
    :param N: Number of polynomials
    :rtype: numpy.array of shape :code:`X.shape[0] x N`
    '''
    out = np.zeros((X.shape[0], N))
    deg = N - 1
    factorial = np.ones((1,N))
    for i in range(1,N):
        factorial[0,i:]*=i
    orthonormalizer = 1 / np.sqrt(factorial)
    if deg < 1:
        out = np.ones((X.shape[0], 1))
    else:
        out[:, 0] = np.ones((X.shape[0],))      
        out[:, 1] = X
        for n in range(1, deg):
            out[:, n + 1] = X * out[:, n] - n * out[:, n - 1]
    return out * orthonormalizer

但是，我没有找到任何既容易理解的示例代码（只有Python和MATLAB经验，没有计算机科学家），并且很难实际有用（我只发现a+b种类例子）。

到目前为止，我到达了以下函数，需要传递一个数组（我自己无法定义数组，cuda.local.array((N,1),dtype=float64)导致ConstantInferenceError）。我接受了我必须按顺序进行乘法运算，因此额外的for循环，但甚至不起作用，因为我得到Invalid usage of * with parameters (array(float64, 1d, C), float64)错误。

@numba.jit(target = 'cuda')
def hermite_polynomials2(X, N,out):
    r'''
    Evaluate the orthonormal Hermite polynomials on 
    :math:`(\mathbb{R},\frac{1}{\sqrt{2\pi}}\exp(-x^2/2)dx)` in :math:`X\subset\mathbb{R}`


    :param X: Locations of desired evaluations
    :type X:  One dimensional np.array
    :param N: Number of polynomials
    :rtype: numpy.array of shape :code:`X.shape[0] x N`
    '''
    deg = N-1
    L = X.shape[0]
    if deg  == 0:
        return
    else:     
        out[:, 1] = X
        for n in range(1, deg):
            for j in range(L):
                out[j, n + 1] = X * out[j, n] - n * out[j, n - 1]
    factorial = 1
    for i in range(1,N):
        factorial *= i
        for j in range(L):
            out[j,i] /= np.sqrt(factorial)
    return

如何进行乘法？

Answer 1

您可能需要以下内容：

Password

但请注意，编写此内核的整个练习大多是徒劳的。引自相关的documentation：

为了获得最佳性能，用户应编写每个线程都有的代码一次只处理一个元素。

您编写的内核将完全串行。它会比CPU版本慢。您需要以非常不同的方式编写代码，以使其在GPU上具有任何价值。

Python：使用Numba在GPU上执行简单的功能。 `带参数的无效用法（数组（float64,1d，C），float64）`

1 个答案: