我已经实现了一个非常简单的深度神经网络来执行多标签分类。模型的概述是(为了简单的可视化而省略了偏差):
即具有ReLU单位和Sigmoid作为输出单位的3层深度神经网络。
损失函数是Sigmoid Cross Entropy,使用的优化器是Adam。
当我训练NN 而没有 Dropout时,我得到以下结果:
#Placeholders
x = tf.placeholder(tf.float32,[None,num_features],name='x')
y = tf.placeholder(tf.float32,[None,num_classes],name='y')
keep_prob = tf.placeholder(tf.float32,name='keep_prob')
#Layer1
WRelu1 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu1')
bRelu1 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu1')
layer1 = tf.add(tf.matmul(x,WRelu1),bRelu1,name='layer1')
relu1 = tf.nn.relu(layer1,name='relu1')
#Layer2
WRelu2 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu2')
bRelu2 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu2')
layer2 = tf.add(tf.matmul(relu1,WRelu2),bRelu2,name='layer2')
relu2 = tf.nn.relu(layer2,name='relu2')
#Layer3
WRelu3 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu3')
bRelu3 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu3')
layer3 = tf.add(tf.matmul(relu2,WRelu3),bRelu3,name='layer3')
relu3 = tf.nn.relu(tf.matmul(relu2,WRelu3) + bRelu3,name='relu3')
#Out layer
Wout = tf.Variable(tf.truncated_normal([num_features,num_classes],stddev=1.0),dtype=tf.float32,name='wout')
bout = tf.Variable(tf.zeros([num_classes]),dtype=tf.float32,name='bout')
logits = tf.add(tf.matmul(relu3,Wout),bout,name='logits')
#Predictions
logits_sigmoid = tf.nn.sigmoid(logits,name='logits_sigmoid')
#Cost & Optimizer
cost = tf.losses.sigmoid_cross_entropy(y,logits)
optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cost)
测试数据的评估结果:
ROC AUC - micro average: 0.6474180196222774
ROC AUC - macro average: 0.6261438437099212
Precision - micro average: 0.5112489722699753
Precision - macro average: 0.48922193879411413
Precision - weighted average: 0.5131092162035961
Recall - micro average: 0.584640369246549
Recall - macro average: 0.55746897003228
Recall - weighted average: 0.584640369246549
当我训练这个NN 添加Dropout图层时,我得到以下结果:
#Placeholders
x = tf.placeholder(tf.float32,[None,num_features],name='x')
y = tf.placeholder(tf.float32,[None,num_classes],name='y')
keep_prob = tf.placeholder(tf.float32,name='keep_prob')
#Layer1
WRelu1 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu1')
bRelu1 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu1')
layer1 = tf.add(tf.matmul(x,WRelu1),bRelu1,name='layer1')
relu1 = tf.nn.relu(layer1,name='relu1')
#DROPOUT
relu1 = tf.nn.dropout(relu1,keep_prob=keep_prob,name='relu1drop')
#Layer2
WRelu2 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu2')
bRelu2 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu2')
layer2 = tf.add(tf.matmul(relu1,WRelu2),bRelu2,name='layer2')
relu2 = tf.nn.relu(layer2,name='relu2')
#DROPOUT
relu2 = tf.nn.dropout(relu2,keep_prob=keep_prob,name='relu2drop')
#Layer3
WRelu3 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu3')
bRelu3 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu3')
layer3 = tf.add(tf.matmul(relu2,WRelu3),bRelu3,name='layer3')
relu3 = tf.nn.relu(tf.matmul(relu2,WRelu3) + bRelu3,name='relu3')
#DROPOUT
relu3 = tf.nn.dropout(relu3,keep_prob=keep_prob,name='relu3drop')
#Out layer
Wout = tf.Variable(tf.truncated_normal([num_features,num_classes],stddev=1.0),dtype=tf.float32,name='wout')
bout = tf.Variable(tf.zeros([num_classes]),dtype=tf.float32,name='bout')
logits = tf.add(tf.matmul(relu3,Wout),bout,name='logits')
#Predictions
logits_sigmoid = tf.nn.sigmoid(logits,name='logits_sigmoid')
#Cost & Optimizer
cost = tf.losses.sigmoid_cross_entropy(y,logits)
optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cost)
测试数据的评估结果:
ROC AUC - micro average: 0.5
ROC AUC - macro average: 0.5
Precision - micro average: 0.34146163499985405
Precision - macro average: 0.34146163499985405
Precision - weighted average: 0.3712475781926326
Recall - micro average: 1.0
Recall - macro average: 1.0
Recall - weighted average: 1.0
正如您在Dropout版本中使用Recall值所看到的那样,NN输出始终为1,对于每个样本的每个类,总是为正类。
确实这不是一个简单的问题,但在应用Dropout之后,我预计至少会有类似的结果,如果没有Dropout,不会导致更糟糕的结果,当然也不会出现这种饱和输出。
为什么会发生这种情况?我怎么能避免这种行为?您是否在代码中看到了一些奇怪或不好的事情?
超参数:
辍学率:0.5 @ training / 1.0 @ inference
时代:500
学习率:0.0001
数据集信息:
实例数:+22.000
课程数量:6
谢谢!
答案 0 :(得分:0)
最后,我设法用更多的实验解决了我自己的问题,所以这就是我想出来的。
我导出了Tensorboad图和权重,偏差和激活数据,以便在TB上探索它们。
然后我意识到重量不太好。
正如您所看到的,重量根本没有变化。换句话说,该层“没有学习”任何东西。
但是那时的摘要就在我面前。权重的分布过于宽泛。看看那个直方图范围,从[-2,2]太多了。
然后我意识到我正在使用
初始化权重矩阵truncated_normal(mean=0.0, std=1.0)
这是一个非常高的正确init的std.dev。显而易见的技巧是使用更正确的初始化来初始化权重。然后,我选择“Xavier Glorot Initialization”,然后权重变为:
预测不再是积极的,再次成为混合预测。当然,由于Dropout,测试集的性能更好。
总之,没有Dropout的网络能够通过过于宽泛的初始化来学习一些东西,但是Dropout的网络没有,并且需要更好的初始化才能不被卡住。
感谢所有阅读过帖子并发表评论的人。