如何通过Nvprof命令行逐层分析深度学习应用程序?

时间:2018-07-23 19:16:01

标签: deep-learning nvidia nvprof

我想从GAN应用程序中捕获分层信息。从这些信息中,我想可视化花费的总时间(逐层)和失速故障信息(逐层地)。

这是我的GAN应用程序中的示例代码:

# G(z)
class generator(nn.Module):
    # initializers
    def __init__(self, d=128):
        super(generator, self).__init__()
        self.deconv1 = nn.ConvTranspose2d(100, d*8, 4, 1, 0)
        self.deconv1_bn = nn.BatchNorm2d(d*8)
        self.deconv2 = nn.ConvTranspose2d(d*8, d*4, 4, 2, 1)
        self.deconv2_bn = nn.BatchNorm2d(d*4)
        self.deconv3 = nn.ConvTranspose2d(d*4, d*2, 4, 2, 1)
        self.deconv3_bn = nn.BatchNorm2d(d*2)
        self.deconv4 = nn.ConvTranspose2d(d*2, d, 4, 2, 1)
        self.deconv4_bn = nn.BatchNorm2d(d)
        self.deconv5 = nn.ConvTranspose2d(d, 1, 4, 2, 1)

我在这里有几个问题:

1)假设我要监视deconv1层,是否应该将其放在--kernel自变量中?

我的nvprof参数应该是什么样?

2)如何通过invocation order捕获kernel IDkernel namenvprof


修改

我发现了这样的东西

--kernels This option changes the scope of subsequent "--events", "--metrics" options. The syntax is as following: or ::: The context/stream IDs, names, kernel name and invocation can be regular expressions. Empty string matches any number or characters. If or is a positive number, it's strictly matched against the CUDA context/stream ID. Otherwise it's treated as a regular expression and matched against the context/stream name

有人可以在这里告诉我什么是上下文ID,流ID,调用吗?

0 个答案:

没有答案