我正在尝试输入图像和矢量作为模型的输入。图像具有正确的4d形状,但是我输入的矢量没有这种形状。图像大小为424x512,而矢量的形状为(18,)。使用数据加载器后,我得到了形状为(50x1x424x512)和(50x18)的批次。模型也会产生误差,因为它也需要矢量形状为4d。我怎么做? 这是我的代码:
def loadTrainingData_B(args):
fdm = []
tdm = []
parameters = []
for i in image_files[:4]:
try:
false_dm = np.fromfile(join(ref, i), dtype=np.int32)
false_dm = Image.fromarray(false_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
fdm.append(false_dm)
true_dm = np.fromfile(join(ref, i), dtype=np.int32)
true_dm = Image.fromarray(true_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
tdm.append(true_dm)
pos = param_filenames.index(i)
param = np.array(params[pos, 1:])
param = np.where(param == '-point-light-source', 1, param).astype(np.float64)
parameters.append(param)
except:
print('[!] File {} not found'.format(i))
return (fdm, parameters, tdm)
class Flat_ModelB(Dataset):
def __init__(self, args, train=True, transform=None):
self.args = args
if train == True:
self.fdm, self.parameters, self.tdm = loadTrainingData_B(self.args)
else:
self.fdm, self.parameters, self.tdm = loadTestData_B(self.args)
self.data_size = len(self.parameters)
self.transform = transforms.Compose([transforms.ToTensor()])
def __getitem__(self, index):
return (self.transform(self.fdm[index]).double(), torch.from_numpy(self.parameters[index]).double(), self.transform(self.tdm[index]).double())
def __len__(self):
return self.data_size
我得到的错误是:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 2-dimensional input of size [50, 18] instead
这是模型:
class Model_B(nn.Module):
def __init__(self, config):
super(Model_B, self).__init__()
self.config = config
# CNN layers for fdm
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer3 = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer4 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer5 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=5, stride=2, padding=2,output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer6 = nn.Sequential(
nn.ConvTranspose2d(in_channels=16, out_channels=1, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(1))
# CNN layer for parameters
self.param_layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
def forward(self, x, y):
out = self.layer1(x)
out_param = self.param_layer1(y)
print("LayerParam 1 Output Shape : {}".format(out_param.shape))
print("Layer 1 Output Shape : {}".format(out.shape))
out = self.layer2(out)
print("Layer 2 Output Shape : {}".format(out.shape))
out = self.layer3(out)
# out = torch.cat((out, out_param), dim=2)
print("Layer 3 Output Shape : {}".format(out.shape))
out = self.layer4(out)
print("Layer 4 Output Shape : {}".format(out.shape))
out = self.layer5(out)
print("Layer 5 Output Shape : {}".format(out.shape))
out = self.layer6(out)
print("Layer 6 Output Shape : {}".format(out.shape))
return out
以及我访问数据的方法:
for batch_idx, (fdm, parameters) in enumerate(self.data):
if self.config.gpu:
fdm = fdm.to(device)
parameters = parameters.to(device)
print('shape of parameters for model a : {}'.format(parameters.shape))
output = self.model(fdm)
loss = self.criterion(output, parameters)
编辑: 我认为我的代码不正确,因为我试图对(18)的向量应用卷积。我试图复制矢量并将其制成(18x64),然后输入。它仍然不起作用,并给出以下输出:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 3-dimensional input of size [4, 18, 64] instead
如果我不能做任何事情,我不确定如何将18长度的向量连接到第3层的输出。
答案 0 :(得分:1)
好像您正在训练自动编码器模型,并希望通过瓶颈层中的一些附加矢量输入对其进行参数化。如果要对其执行一些转换,则必须确定是否需要任何空间相关性。给定恒定的输入大小(N,1、424、512),layer3
的输出将具有形状(N,32、53、64)。您有很多选择,具体取决于所需的模型性能:
nn.Linear
进行激活来转换参数向量。然后,您可以添加额外的空间尺寸,并在所有空间位置重复此向量:img = torch.rand((1, 1, 424, 512))
vec = torch.rand(1, 19)
layer3_out = model(img)
N, C, H, W = layer3_out.shape
param_encoder = nn.Sequential(nn.Linear(19, 30), nn.ReLU(), nn.Linear(30, 10))
param = param_encoder(vec)
param = param.unsqueeze(-1).unsqueeze(-1).expand(N, -1, H, W)
encoding = torch.cat([param, layer3_out], dim=1)
layer3
输出的大小。但这很难实现,因为您必须计算出精确的输出形状以适合(N,32,53,64)nn.Linear
将带有MLP的输入向量转换为layer3
输出中通道的大小的2倍。然后使用所谓的Feature-wise transformations从layer3
缩放和移动要素地图。 我建议从第一个选项开始,因为这是最简单的实现,然后尝试其他。