如何在some_module.forward(some_input)(在GPU上)内部并行化for循环?

时间:2019-02-25 12:52:10

标签: pytorch

假设我有一个模型(主要是伪代码):

class SomeLayer(nn.Module):
    def __init__(self, s):
        #init some layers etc
        self.N = s*s
    def forward(self, input_tensor):
        #intialize some variables
        some_results=[]
        for iter_i in range(self.N):
            # do independent operations on different parts of input_tensor
            # each operation is basically a copy of a subtensor of input_tensor
            # such that its size depends on iter_i
            # append_results to some_results
        return some_results

并行化这种for循环的正确方法是什么?目前,我正打算为此编写一个小的CUDA内核,并从python加载它,但感觉有些过头了,我认为应该有一种简单的方法来做到这一点,尽管我无法在Windows中找到它。文档。

0 个答案:

没有答案