我只是在尝试使用PyTorch进行的实验,在该实验中,我尝试根据给定的一对图像(原始图像和转换后的图像)计算仿射变换矩阵。在此示例中,我仅使用一个5x5的小网格作为原始图像,并以直线倾斜45度作为转换后的输出。出于某种原因,似乎损耗降低了,并且梯度变得越来越小(显然)。但是它收敛的解决方案似乎相去甚远(完全看起来不像一条直线)。
import numpy as np
import matplotlib.pyplot as plt
import torch.optim as optim
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.manual_seed(989)
# source_image = torch.tensor([[0,1,0],[0,1,0],[0,1,0]])
source_image = torch.tensor([[0,0,1,0,0],[0,0,1,0,0],[0,0,1,0,0],[0,0,1,0,0],[0,0,1,0,0]])
plt.imshow(source_image)
# transformed_image = torch.eye(3)
transformed_image = torch.eye(5)
plt.imshow(transformed_image)
source_image = source_image.reshape(1, 1, source_image.shape[0], source_image.shape[1])
transformed_image = transformed_image.reshape(1, 1, transformed_image.shape[0], transformed_image.shape[1])
source_image = source_image.type(torch.FloatTensor)
transformed_image = transformed_image.type(torch.FloatTensor)
class AffineNet(nn.Module):
def __init__(self):
super(AffineNet, self).__init__()
self.M = torch.nn.Parameter(torch.randn(1, 2, 3))
def forward(self, im):
flow_grid = F.affine_grid(self.M, transformed_image.size())
transformed_flow_image = F.grid_sample(transformed_image, flow_grid, padding_mode="border")
return transformed_flow_image
affineNet = AffineNet()
optimizer = optim.SGD(affineNet.parameters(), lr=0.01)
criterion = nn.MSELoss()
for i in range(1000):
optimizer.zero_grad()
output = affineNet(transformed_image)
loss = criterion(output, source_image)
loss.backward()
if(i%10==0):
print(i, loss.item(), affineNet.M.grad)
optimizer.step()
print(affineNet.M)
printme = output.detach().reshape(output.shape[2], output.shape[3])
plt.imshow(printme.cpu())
如果您将注释行弄乱并使用3x3网格而不是5x5,它似乎确实可以正常工作。有人可以帮助我了解为什么会这样吗?如果我也玩种子,似乎会有很大差异。