使用PyTorch nn.Sequential
模型,我无法学习XOR布尔值的所有四种表示形式:
import numpy as np
import torch
from torch import nn
from torch.autograd import Variable
from torch import FloatTensor
from torch import optim
use_cuda = torch.cuda.is_available()
X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T
# Converting the X to PyTorch-able data structure.
X_pt = Variable(FloatTensor(X))
X_pt = X_pt.cuda() if use_cuda else X_pt
# Converting the Y to PyTorch-able data structure.
Y_pt = Variable(FloatTensor(Y), requires_grad=False)
Y_pt = Y_pt.cuda() if use_cuda else Y_pt
hidden_dim = 5
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
criterion = nn.L1Loss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000
for _ in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
loss_this_epoch.backward()
optimizer.step()
print([int(_pred > 0.5) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])
学习后:
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')
[OUT]:
Input: [0, 0]
Pred: 0
Ouput: 0
######
Input: [0, 1]
Pred: 1
Ouput: 1
######
Input: [1, 0]
Pred: 0
Ouput: 1
######
Input: [1, 1]
Pred: 0
Ouput: 0
######
我尝试在几个随机种子上运行相同的代码,但它没有设法学习所有的XOR表示。
如果没有PyTorch,我可以轻松训练具有自定义导数函数的模型并手动执行反向传播,请参阅https://www.kaggle.io/svf/2342536/635025ecf1de59b71ea4fa03eb84f9f9/results.html#After-some-enlightenment
为什么使用PyTorch的2层MLP没有学习XOR表示?
PyTorch中的模型如何:
hidden_dim = 5
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
与使用衍生工具手写的那个不同,以及https://www.kaggle.com/alvations/xor-with-mlp手动编写的反向传播和优化器步骤?
隐藏的分层感知器网络是一样的吗?
奇怪的是,在nn.Sigmoid()
图层之间添加nn.Linear
不起作用:
hidden_dim = 5
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Sigmoid(),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
criterion = nn.L1Loss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000
for _ in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
loss_this_epoch.backward()
optimizer.step()
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')
[OUT]:
Input: [0, 0]
Pred: 0
Ouput: 0
######
Input: [0, 1]
Pred: 1
Ouput: 1
######
Input: [1, 0]
Pred: 1
Ouput: 1
######
Input: [1, 1]
Pred: 1
Ouput: 0
######
但添加nn.ReLU()
确实:
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
...
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')
[OUT]:
Input: [0, 0]
Pred: 0
Ouput: 0
######
Input: [0, 1]
Pred: 1
Ouput: 1
######
Input: [1, 0]
Pred: 1
Ouput: 1
######
Input: [1, 1]
Pred: 1
Ouput: 0
######
对于非线性激活,是不是一个sigmoid?
我知道ReLU
符合布尔输出的任务,但不应该Sigmoid
函数产生相同/相似的效果吗?
运行相同的训练100次:
from collections import Counter
import random
random.seed(100)
import torch
from torch import nn
from torch.autograd import Variable
from torch import FloatTensor
from torch import optim
use_cuda = torch.cuda.is_available()
all_results=[]
for _ in range(100):
hidden_dim = 2
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.ReLU(), # Does the sigmoid has a build in biased?
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
criterion = nn.MSELoss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 3000
for _ in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
loss_this_epoch.backward()
optimizer.step()
##print([float(_pred) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])
x_pred = [int(model(_x)) for _x in X_pt]
y_truth = list([int(_y[0]) for _y in Y_pt])
all_results.append([x_pred == y_truth, x_pred, loss_this_epoch.data[0]])
tf, outputsss, losses__ = zip(*all_results)
print(Counter(tf))
它只设法在100次中学习XOR表示18 ... -_- |||
答案 0 :(得分:5)
因为nn.Linear
没有内置激活,所以你的模型实际上是一个线性分类器,而XOR是一个不能用线性分类器解决的问题的典型例子。
改变这个:
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
到那个:
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Sigmoid(),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
只有这样,您的模型才能与链接的Kaggle笔记本中的模型等效。
答案 1 :(得分:0)
以下是对代码的一些简单更改,这些更改应该有助于您走上更好的道路。我在内部使用了ReLU激活功能,但如果使用正确,sigmoid也可以工作。此外,如果您想尝试使用SGD优化器,您可能需要将学习速率调低一个数量级左右。
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
if use_cuda:
model.cuda()
criterion = nn.BCELoss()
#criterion = nn.L1Loss()
#learning_rate = 0.03
#optimizer = optim.SGD(model.parameters(), lr=learning_rate)
optimizer = optim.Adam(model.parameters())
num_epochs = 10000
for epoch in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
model.zero_grad()
loss_this_epoch.backward()
optimizer.step()
if epoch%1000 == 0:
print([float(_pred) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])
答案 2 :(得分:0)
在各层之间以及最后都有S形,最重要的要考虑的是以纯随机的方式更新权重,即在每个样本后进行更新,并在每次迭代中进行选择一个样本。
考虑到这一点,并且使用较大的学习率(约1.0)时,我观察到该模型通常使用标准的2层pytorch实现(2-2-1层大小)对XOR进行精细学习。标准权重初始化,无需正则化。
答案 3 :(得分:0)
第二次更新即将完成。这是一个带有有效解决方案的笔记本:https://colab.research.google.com/github/osipov/edu/blob/master/misc/xor.ipynb
您的错误是在最后一个线性层之后使用 sigmoid,这使得优化器难以收敛到训练数据集中预期的 0 和 1 值。回想一下,sigmoid 在负无穷和正无穷处分别接近 0 和 1。
因此,您的实现(假设 PyTorch 1.7)应该是
import torch as pt
from torch.nn.functional import mse_loss
pt.manual_seed(33);
model = pt.nn.Sequential(
pt.nn.Linear(2, 5),
pt.nn.ReLU(),
pt.nn.Linear(5, 1)
)
X = pt.tensor([[0, 0],
[0, 1],
[1, 0],
[1, 1]], dtype=pt.float32)
y = pt.tensor([0, 1, 1, 0], dtype=pt.float32).reshape(X.shape[0], 1)
EPOCHS = 100
optimizer = pt.optim.Adam(model.parameters(), lr = 0.03)
for epoch in range(EPOCHS):
#forward
y_est = model(X)
#compute mean squared error loss
loss = mse_loss(y_est, y)
#backprop the loss gradients
loss.backward()
#update the model weights using the gradients
optimizer.step()
#empty the gradients for the next iteration
optimizer.zero_grad()
执行后训练model
,以便
model(X).round().abs()
返回
tensor([[0.],
[1.],
[1.],
[0.]], grad_fn=<AbsBackward>)
这是正确的输出。