Question

我想建立视频的感知损失，这意味着我的损失包含一个预先训练的网络（在我的工作中，我想到使用针对视频识别任务进行了训练的resnet 3D），并且我希望将生成的视频和真实的视频通过网络，并在某些层中进行输出（即，对于resnet 3D网络中的每个视频，分别在layer3，layer5…之后）。

我知道在pytorch的“模型”包中，我可以加载特定的经过预先训练的模型，并在输入上使用features功能。例如：

class _netVGGFeatures(nn.Module):
def __init__(self):

    super(_netVGGFeatures, self).__init__()
    self.vggnet = models.vgg16(pretrained=True).cuda() #Load the pre trained VGG16 model from pytorch on GPU
    self.layer_ids = [2, 7, 12, 21, 30]


def main(self, z, levels):
    layer_ids = self.layer_ids[:levels] #if its 64 its [2, 7, 12, 21]
    id_max = layer_ids[-1] + 1 #22
    output = []
    for i in range(id_max):
        **z = self.vggnet.features[i](z)** #extract and slice the features and operate them on the input
        if i in layer_ids:
            output.append(z) #adding just the operation of the layers that in the layer_ids array on the image
    return output

def forward(self, z, levels):
    output = self.main(z, levels)
    return output

我尝试在我的3D resnet上使用“功能”，但是它不起作用。

有人知道如何使用某些图层吗？而不是更改网络的前向功能以返回一个数组而不是一个输出。

在预先训练的模型中使用部分图层-pytorch

0 个答案: