我想将pytorch前馈网络安装在精心制作的数据集上,并在标签y和数据集中的两个特征之间建立依赖关系。
使用np.random.random_sample
为0和1之间的分布生成数据集,并使用以下两个函数计算标签:
sum_bin_label
sum_mod_label
第一个函数我看到神经网络的训练损失和验证损失都在减少,最终它能够以接近预期的100%逼近该函数,但是第二个函数正在使用{{ 1}}和sum
无法取得任何进展。我尝试了多种学习率和网络体系结构,但未能适应。
我很想知道如何安装该功能。
Bellow是一个简单的示例,可以直接将其粘贴到jupyter笔记本或任何种类的python repl。
谢谢!
进口
modulo(num_classes)
使用的函数和类
import torch
import numpy as np
from sklearn.model_selection import train_test_split
import torch.utils.data as utils
DATASHAPE = (2000, 2)
NUM_CLASSES = 3
使用第一个函数进行迭代(最终收敛)
def sum_mod_label(x):
return np.array([x for x in map(
lambda x: x % NUM_CLASSES, map(int, (x[:, 0] + x[:, 1]) * 100))])
def sum_bin_label(x):
def binit(x):
if x < 0.807:
return 0
if x < 1.169:
return 1
return 2
return np.array(
[x for x in map(lambda x: binit(x), x[:, 0] + x[:, 1])])
class RandomModuloDataset(utils.Dataset):
def __init__(self, shape, label_fn):
self.data = np.random.random_sample(shape)
self.label = label_fn(self.data)
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx, :], self.label[idx]
class FeedForward(torch.nn.Module):
def __init__(self, input_size, num_classes):
super().__init__()
self.input_size = input_size
self.num_classes = num_classes
self.relu = torch.nn.ReLU()
self.softmax = torch.nn.Softmax(dim=-1)
self.fc1 = torch.nn.Linear(
self.input_size, self.input_size)
self.fc2 = torch.nn.Linear(
self.input_size, self.num_classes)
def forward(self, x, **kwargs):
output = self.fc2(self.relu(self.fc1(x.float())))
return self.softmax(output)
def fitit(trainloader, epochs=10):
neurons = DATASHAPE[1]
net = FeedForward(neurons, NUM_CLASSES)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(epochs):
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print('[%d] loss: %.3f' %
(epoch + 1, loss.item()))
具有第二个函数的迭代(不收敛)
sum_bin_tloader = utils.DataLoader(
RandomModuloDataset(DATASHAPE, sum_bin_label))
fitit(sum_bin_tloader, epochs=50)
[1] loss: 1.111
[2] loss: 1.133
[3] loss: 1.212
[4] loss: 1.264
[5] loss: 1.261
[6] loss: 1.199
[7] loss: 1.094
[8] loss: 1.011
[9] loss: 0.958
[10] loss: 0.922
[11] loss: 0.896
[12] loss: 0.876
[13] loss: 0.858
[14] loss: 0.844
[15] loss: 0.831
[16] loss: 0.820
[17] loss: 0.811
[18] loss: 0.803
[19] loss: 0.795
[20] loss: 0.788
[21] loss: 0.782
[22] loss: 0.776
[23] loss: 0.771
[24] loss: 0.766
[25] loss: 0.761
[26] loss: 0.757
[27] loss: 0.753
[28] loss: 0.749
[29] loss: 0.745
[30] loss: 0.741
[31] loss: 0.738
[32] loss: 0.734
[33] loss: 0.731
[34] loss: 0.728
[35] loss: 0.725
[36] loss: 0.722
[37] loss: 0.719
[38] loss: 0.717
[39] loss: 0.714
[40] loss: 0.712
[41] loss: 0.709
[42] loss: 0.707
[43] loss: 0.705
[44] loss: 0.703
[45] loss: 0.701
[46] loss: 0.699
[47] loss: 0.697
[48] loss: 0.695
[49] loss: 0.693
[50] loss: 0.691
我希望能够同时使用这两个函数,因为NN应该能够找到描述因变量的任何函数y = f(x),但是对sum_mod_label的培训仍未进行。
使用catboost,我能够获得合理的准确度(sum_mod_label上约为75%)
答案 0 :(得分:0)
只需尝试绘制数据,您就会发现函数产生了不同复杂度的数据集。 在第二种情况下,类几乎是不可分割的,因此您需要增加模型的复杂性。
如何增加模型的复杂性:
代码:
import numpy as np
import matplotlib.pyplot as p
DATASHAPE = (2000, 2)
NUM_CLASSES = 3
def sum_mod_label(x):
return np.array([x for x in map(lambda x: x % NUM_CLASSES, map(int, (x[:, 0] + x[:, 1]) * 100))])
def sum_bin_label(x):
def binit(x):
if x < 0.807:
return 0
if x < 1.169:
return 1
return 2
return np.array([x for x in map(lambda x: binit(x), x[:, 0] + x[:, 1])])
data = np.random.random_sample(DATASHAPE)
bin_label = sum_bin_label(data)
mod_label = sum_mod_label(data)
def plot_data(data, label, title):
plt.figure(figsize=(9, 9))
plt.title(title)
plt.scatter(data[..., 0], data[..., 1], c=label)
plt.show()
plot_data(data, bin_label, 'sum_bin_label')
plot_data(data, mod_label, 'sum_mod_label')
输出: