张量流中的自定义丢失

时间:2017-01-11 05:47:22

标签: tensorflow neural-network dropout

我在某些数据上训练DNN模型,并希望分析学到的权重,以了解我正在研究的真实系统(生物学中的信号级联)。我猜有人可以说我正在使用人工神经网络来学习生物神经网络。

对于我的每个训练样例,我已经删除了一个基因,它负责顶层的信号传递。

当我将这个信号级联建模为NN,并删除第一个隐藏层中的一个节点时,我意识到我正在做一个真实版本的丢失。

因此我想使用dropout来训练我的模型,但是我在网上看到的辍学实现似乎随机丢弃了一个节点。我需要的是一种为每个训练样例指定要丢弃的节点的方法。

有关如何实施此建议的任何建议?我对任何软件包都持开放态度,但是现在我所做的一切都在Tensorflow中,所以我很欣赏使用该框架的解决方案。

对于那些更喜欢解释细节的人:

我有10个输入变量,它们完全连接到第一层中的32个relu节点,它们完全连接到第二层(relu),它完全连接到输出(线性因为我正在进行回归)。

除了10个输入变量之外,我也碰巧知道应该丢弃28个节点中的哪个节点。

训练时有没有办法指明这个?

以下是我目前使用的代码:

num_stresses = 10
num_kinase = 32
num_transcription_factors = 200
num_genes = 6692

# Build neural network
# Input variables (10)
# Which Node to dropout (32)
stress = tflearn.input_data(shape=[None, num_stresses])
kinase_deletion = tflearn.input_data(shape=[None, num_kinase])

# This is the layer that I want to perform selective dropout on,
# I should be able to specify which of the 32 nodes should output zero
# based on a 1X32 vector of ones and zeros. 
kinase = tflearn.fully_connected(stress, num_kinase, activation='relu')

transcription_factor = tflearn.fully_connected(kinase, num_transcription_factors, activation='relu')

gene = tflearn.fully_connected(transcription_factor, num_genes, activation='linear')

adam = tflearn.Adam(learning_rate=0.00001, beta1=0.99)

regression = tflearn.regression(gene, optimizer=adam, loss='mean_square', metric='R2')

# Define model
model = tflearn.DNN(regression, tensorboard_verbose=1)

2 个答案:

答案 0 :(得分:3)

我会提供你的输入变量和一个大小相等的矢量,除了你要丢弃的那个,一个是0。

然后,第一个操作应该是将要丢弃的基因乘以零。从那以后,它应该与你现在拥有的完全相同。

你可以在将它交给tensorflow之前将你的基因乘以(将你的基因归零),或者添加另一个占位符并将它作为你的变量提供给feed_dict中的图形。后者可能会更好。

如果您需要删除一个隐藏节点(在第2层),它只是另一个1s和0的向量。

如果有效,或者您需要更多帮助,请告诉我。

编辑: 好的,所以我没有真正使用tflearn(我只是做了常规的tensorflow),但我认为你可以结合tensorflow和tflearn。基本上,我添加了tf.multiply。您可能需要添加其他tflearn.input_data(shape =[num_stresses])tflearn.input_data(shape =[num_kinase]),以便为stresses_dropout_vectorkinase_dropout_vector添加占位符。当然,您可以在这两个向量中更改零的数量和位置。

import tensorflow as tf ###### New ######
import tflearn

num_stresses = 10
num_kinase = 32
num_transcription_factors = 200
num_genes = 6692

stresses_dropout_vector = [1] * num_stresses ###### NEW ######
stresses_dropout_vector[desired_node_to_drop] = 0 ###### NEW ######

kinase_dropout_vector = [1] * num_kinase ###### NEW ######
kinase_dropout_vector[desired_hidden_node_to_drop] = 0 ###### NEW ######

# Build neural network
# Input variables (10)
# Which Node to dropout (32)
stress = tflearn.input_data(shape=[None, num_stresses])
kinase_deletion = tflearn.input_data(shape=[None, num_kinase])

# This is the layer that I want to perform selective dropout on,
# I should be able to specify which of the 32 nodes should output zero
# based on a 1X32 vector of ones and zeros. 

stress_dropout = tf.multiply(stress, stresses_dropout_vector) ###### NEW ###### Drops out an input
kinase = tflearn.fully_connected(stress_dropout, num_kinase, activation='relu') ### changed stress to stress_dropout
kinase_dropout = tf.multiply(kinase, kinase_dropout_vector) ###### NEW ###### Drops out a hidden node

transcription_factor = tflearn.fully_connected(kinase_dropout, num_transcription_factors, activation='relu') ### changed kinase to kinase_dropout

gene = tflearn.fully_connected(transcription_factor, num_genes, activation='linear')

adam = tflearn.Adam(learning_rate=0.00001, beta1=0.99)

regression = tflearn.regression(gene, optimizer=adam, loss='mean_square', metric='R2')

# Define model
model = tflearn.DNN(regression, tensorboard_verbose=1)

如果添加tensorflow不起作用,你只需找到一个常规的旧tflearn.multiply函数,它可以对两个给定的张量/向量进行元素乘法运算。

希望有所帮助。

答案 1 :(得分:1)

为了完整起见,这是我的最终实施:

import numpy as np
import pandas as pd
import tflearn
import tensorflow as tf

meta = pd.read_csv('../../input/nn/meta.csv')
experiments = meta["Unnamed: 0"]
del meta["Unnamed: 0"]

stress_one_hot = pd.get_dummies(meta["train"])

kinase_deletion = pd.get_dummies(meta["Strain"])
kinase_one_hot = 1 - kinase_deletion

expression = pd.read_csv('../../input/nn/data.csv')
genes = expression["Unnamed: 0"]
del expression["Unnamed: 0"] # This holds the gene names just so you know...

expression = expression.transpose()

# Set up data for tensorflow
# Gene expression
target = expression
target = np.array(expression, dtype='float32')
target_mean = target.mean(axis=0, keepdims=True) 
target_std = target.std(axis=0, keepdims=True)
target = target - target_mean
target = target / target_std

# Stress information
data1 = stress_one_hot
data1 = np.array(data1, dtype='float32')
data_mean = data1.mean(axis=0, keepdims=True)
data_std = data1.std(axis=0, keepdims=True)
data1 = data1 - data_mean
data1 = data1 / data_std

# Kinase information
data2 = kinase_one_hot
data2 = np.array(data2, dtype='float32')

# For Reference
# data1.shape
# #(301, 10)
# data2.shape
# #(301, 29)


# Build the Neural Network

num_stresses = 10
num_kinase = 29
num_transcription_factors = 200
num_genes = 6692

# Build neural network
# Input variables (10)
# Which Node to dropout (32)
stress = tflearn.input_data(shape=[None, num_stresses])
kinase_deletion = tflearn.input_data(shape=[None, num_kinase])

# This is the layer that I want to perform selective dropout on,
# I should be able to specify which of the 32 nodes should output zero
# based on a 1X32 vector of ones and zeros.
kinase = tflearn.fully_connected(stress, num_kinase, activation='relu')
kinase_dropout = tf.mul(kinase, kinase_deletion)

transcription_factor = tflearn.fully_connected(kinase_dropout, num_transcription_factors, activation='relu')

gene = tflearn.fully_connected(transcription_factor, num_genes, activation='linear')

adam = tflearn.Adam(learning_rate=0.00001, beta1=0.99)

regression = tflearn.regression(gene, optimizer=adam, loss='mean_square', metric='R2')

# Define model
model = tflearn.DNN(regression, tensorboard_verbose=1)

# Start training (apply gradient descent algorithm)
model.fit([data1, data2], target, n_epoch=20000, show_metric=True, shuffle=True)#,validation_set=0.05)