Question

我想知道如何在GPU上实现numpy函数any()（使用Numba python）。 any()函数接受一个数组，如果输入的至少一个元素的计算结果为True，则返回True。

类似的东西：

@vectorize(["boolean(boolean)"], target='cuda')
def AnyFunction(a):
    return any(a)

或

@vectorize(["boolean(boolean)"], target='cuda')
def AnyFunction(a):
    for i in range(len(a)):
        if a[i]==True:
            return True
    return False

Answer 1

Traceback (most recent call last): File "C:\Users\mj\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-5-161ada63727d>", line 3, in <module> model = fasttext.cbow('test.text', 'model') File "fasttext\fasttext.pyx", line 247, in fasttext.fasttext.cbow (fasttext/fasttext.cpp:6860) File "fasttext\fasttext.pyx", line 182, in fasttext.fasttext.train_wrapper (fasttext/fasttext.cpp:5279) ValueError: fastText: cannot load test.text函数操作中最困难的方面（也许是简化方面）。对每个项目的真/假进行测试是可以很容易地通过例如any，但无法（立即）将多个结果组合为一个值（缩减方面）；实际上，vectorize并非旨在解决此类问题，至少不是直接解决该问题。

但是numba cuda提供了一些help来解决简单的还原问题（像这样），而不会强迫您编写自定义的numba cuda内核。

这是一种可能的方法：

vectorize

关于性能的一些评论：

这可能不是执行此操作的最快方法。但是我从您的问题中得到的印象是您正在寻找与普通python接近的东西。
编写custom CUDA kernel in numba可能可以更快地完成这项工作。
如果您对性能很认真，那么建议您尝试将此操作与GPU上要完成的其他工作结合起来。在这种情况下，为了获得最大的灵活性，自定义内核将为您提供以最高性能完成任务的能力。

如何将python函数“ any（）”转换为CUDA python兼容代码（在GPU上运行）？

1 个答案: