我目前正在学习“神经网络的代表性(表达能力)”主题,并试图有意完全过度拟合神经网络,这意味着至少该模型具有在训练数据输入/过程中完美构建映射的能力输出。
我目前在该实验中使用的数据是MNIST,我正在尝试使用AutoEncoder / Decoder结构来检查是否可以故意使神经网络不适用于该网络结构。
我通常感兴趣的是哪种潜在维数组合和多少ReLU是扩大神经网络表达能力的最佳组合,这意味着该组合最小程度地减少了训练损失(在这种情况下,我使用了二进制交叉x和recon_x之间的熵)
问题是,我没有成功地过拟合(损失几乎接近0)。
我尝试了几种具有不同潜在维度的深/浅FCN,我的最佳最小损失成就是55,与0相比看起来太大了。
import torch
import torch.nn as nn
class AE(nn.Module):
def __init__(self,
encoder_layer_sizes,
latent_size,
decoder_layer_sizes,
num_labels=0):
super().__init__()
assert type(encoder_layer_sizes) == list
assert type(latent_size) == int
assert type(decoder_layer_sizes) == list
self.latent_size = latent_size
self.encoder = Encoder(
encoder_layer_sizes,
latent_size,
num_labels)
self.decoder = Decoder(
decoder_layer_sizes,
latent_size,
num_labels)
def forward(self,
x,
c=None):
if x.dim() > 2:
x = x.view(-1, 28*28)
z = self.encoder(x, c)
recon_x = self.decoder(z, c)
return recon_x, z
def inference(self, device, n=1, c=None):
batch_size = n
z = torch.randn([batch_size,
self.latent_size]).to(device)
recon_x = self.decoder(z, c)
return recon_x
class Encoder(nn.Module):
def __init__(self,
layer_sizes,
latent_size,
num_labels):
super().__init__()
self.MLP = nn.Sequential()
for i, (in_size, out_size) in enumerate(zip(layer_sizes[:-1],
layer_sizes[1:])):
print(i, ": ", in_size, out_size)
self.MLP.add_module(name="L{:d}".format(i),
module=nn.Linear(in_size, out_size))
if i != len(layer_sizes):
print("ReLU added @ Encoder")
self.MLP.add_module(name="A{:d}".format(i),
module=nn.ReLU())
# self.MLP.add_module(name="BN{:d}".format(i),
# module=nn.BatchNorm1d(out_size))
self.linear = nn.Linear(layer_sizes[-1], latent_size)
def forward(self, x, c=None):
x = self.MLP(x)
z = self.linear(x)
return z
class Decoder(nn.Module):
def __init__(self,
layer_sizes,
latent_size,
num_labels):
super().__init__()
self.MLP = nn.Sequential()
input_size = latent_size
for i, (in_size, out_size) in enumerate(
zip([input_size]+layer_sizes[:-1], layer_sizes)):
print(i, ": ", in_size, out_size)
self.MLP.add_module(
name="L{:d}".format(i), module=nn.Linear(in_size, out_size))
if i+1 < len(layer_sizes):
if i != 0:
print("ReLU added @ Decoder")
self.MLP.add_module(name="A{:d}".format(i), module=nn.ReLU())
# self.MLP.add_module(name="BN{:d}".format(i),
# module=nn.BatchNorm1d(out_size))
else:
print("Sig step")
self.MLP.add_module(name="sigmoid", module=nn.Sigmoid())
def forward(self, z, c):
x = self.MLP(z)
return x
这是我使用的模型代码,如果我将[784,256,256]放入变量“ layer_sizes”,则该模型将对称地生成编码器解码器,并在给定的输入/输出尺寸线性变换之间使用ReLU。
我尝试了很多“ layer_sizes”,并附上了日志以供参考。
## Goal of the Project
The project goal is about the way to determine the `optimal number of latent dimension`.
First, the project introduces the linearity and non-linearity and postulates the assumption that linearity corresponds to `one` dimension. Then, this linearity could be split into `two` non-overlapping dimension by one ReLU based non-linearity.
Therefore, this project shows that the determination of optimal number of latent dimension
preliminarily `not depend on the data distribution itself`, but depends on `the network structure`,
more specifically, depends on the `total number of dimension that the model
about to express`. The paper will call this total number of dimension that the
model about to express as **model dimension**.
After the model dimension being set, one can train the network and check whether
it's possible to over-fit the network with the data given. If the data points
over-fit in some point of train epochs, this network can be thought as "enough to
express the data distribution". However, if not over-fit, one can consider to
enlarge the **model dimension** and re-try the over-fit process.
## To-do
Define the over-fit.
The classification threshold of over-fit depends on the experiment.
- In which epoch of training process one should determine over-fit?
## Caution
It's better to use whole data when to determine the "model dimension" since
it's about how much non-linearity is required for the collected or targeted
data domain.
## Convergence Determination Metric
When the EpochAVGLoss doe not change more than 1 % for 5 epochs from the first epoch, we determine the training loss being converged
## Experiment Workflow
##### Exp_1 : 1 ReLU applied to 256 dimension. (Then Linear Transformation to LatentDim)
By the assumption, the **model dimension** is 512(256*2). Thus, we verify the assumption by
1) check the sequential decrease of Loss at certain train epoch while sequentially increase the LatentDim
with `1 * (MLP + ReLU) + LatentDim 1`
Epoch 09/10 Batch 0937/937, Loss 165.5437
with `1 * (MLP + ReLU) + LatentDim 2`
Epoch 09/10 Batch 0937/937, Loss 150.2990
with `1 * (MLP + ReLU) + LatentDim 3`
Epoch 09/10 Batch 0937/937, Loss 133.2206
with `1 * (MLP + ReLU) + LatentDim 4`
Epoch 09/10 Batch 0937/937, Loss 138.1151
with `1 * (MLP + ReLU) + LatentDim 8`
Epoch 09/10 Batch 0937/937, Loss 110.9839
with `1 * (MLP + ReLU) + LatentDim 16`
Epoch 09/10 Batch 0937/937, Loss 89.6707
with `1 * (MLP + ReLU) + LatentDim 32`
Epoch 09/10 Batch 0937/937, Loss 72.5663
with `1 * (MLP + ReLU) + LatentDim 64`
Epoch 09/10 Batch 0937/937, Loss 54.2545
> ... since the model converges at LatentDim 64 with Loss 52, we shrink down the ReLU_InputDim to 32 (go to Exp3)
with `1 * (MLP + ReLU) + LatentDim 128`
Epoch 09/10 Batch 0937/937, Loss 54.3565
with `1 * (MLP + ReLU) + LatentDim 256`
Epoch 09/10 Batch 0937/937, Loss 52.3050
> ... must keep decreasing. write the code to automatically does this job
with `1 * (MLP + ReLU) + LatentDim 512`
Epoch 09/10 Batch 0937/937, Loss 53.2412
> ... Check whether at any LatentDim > 512, no decrease of Loss at fixed train epoch.
with `1 * (MLP + ReLU) + LatentDim 1024`
Epoch 09/10 Batch 0937/937, Loss 54.3255
> As you see, with the expansion of LatentDim `doubled`, still the LossAtFixedStep is not decreased,
which means model dimension already being saturated.
#### Exp_2: Now Introduce the Twice more model dimension by ReLU
with `2 * (MLP + ReLU) + LatentDim 1024`
> Epoch 09/10 Batch 0937/937, Loss 57.9039
(without Bias.. the sequential ReLU doesn't work)
### Exp_3 : Shrink down ReLU InputDim to 32 maintaining latentDim 64
### Summary of Algorithm
If convgeLoss != 0:
if modelDim > latentDim:
enlarge latentDim
if modelDim =< latentDim:
increase #ReLU
* modelDim = 2* num_ReLUs
To verify this,
@ exp latentDim 64, convergeLoss 80, layerSize [784, 32],
if one increase the latentDim, convergeLoss should not be below 80
Let's Check!
@ exp latentDim 128, convergeLoss 80, layerSize [784, 32], convergeLoss 80
now, let's add stack the double ReLU layers, [784, 32, 32], which is assumably represents 128 dimension
@ exp latentDim 128, convergeLoss 80, layerSize [784, 32, 32], convergeLoss 80 (still same)
As you see, without enlarge of foremost dimension, the deeper ReLU does not work. This is reference from Raghu(2017)
Now make it wide, such as [784, 64],
@ exp_1555829642 latentDim 128, convergeLoss 80, layerSize [784, 64], the convergeLoss 65 < 80
moreover, make it more wide, such as [784, 128],
@ exp_1555829642 latentDim 128, convergeLoss 55, layerSize [784, 128], the convergeLoss 55 < 80
moreover, make it more wide, such as [784, 256],
@ exp_1555832143 latentDim 128, convergeLoss 55, layerSize [784, 256], the convergeLoss 55 = 55
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555832638 latentDim 256, convergeLoss 55, layerSize [784, 256], the convergeLoss 55 = 55
===> Question! How to determine latentDim with less effort not getting through this cumbersome experimental step?
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555832638 latentDim 128, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 65 > 55
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555832638 latentDim 256, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 68 > 55
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555832638 latentDim 64, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 68 > 55
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555832638 latentDim 128, convergeLoss 60, layerSize [784, 256, 128], the convergeLoss 60 > 55
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555834546 latentDim 64, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 55 = 55
=====> decrease the latentDim makes the model to learn better (Q1)
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555834546 latentDim 32, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 60 > 55
If one check the currently get 55,
listed as:
[784, 128], ld 128
[784, 128], ld 256
[784, 256, 256], ld 64
@ 1555843696, ld64 [784, 128, 128] convergeLoss 60>55
@ 1555844254, ld128 [784, 128, 128] convergeLoss 64>55
@ 1555844254, ld32 [784, 128, 128] convergeLoss 66>55
Dont know why, but if the network is deeper, too many latent space decrease the learning efficiency (Q1)
The problem is, latentDim. Make sure the latentDim is sufficient
@ exp_1555832638 latentDim 32, convergeLoss 65, layerSize [784, 256, 256], the convergeLoss 55 = 55
Maybe, if the modelDim is too big and latentDim is too small, as seen in exp [784, 32, 32],
training might be not working. Thus, we have leverage up the latentDim at the same setting from 128 to 256
@ exp_1555830495 convergeLoss 80 (still same)
如果有人成功或看到了可复制的代码/报告,这些代码/报告成功地学习了具有自动编码器结构的MNIST的严格身份映射,请与我联系!