从DNA序列中,我每个碱基(A,C,G,T,N)的一个热编码为
{'A': [1, 0, 0, 0, 0],
'C': [0, 1, 0, 0, 0],
'G': [0, 0, 1, 0, 0],
'T': [0, 0, 0, 1, 0],
'N': [0, 0, 0, 0, 1]}
每个DNA序列都有400个碱基。所以我的最终训练数据的形状为X_train.shape = (111453, 400, 5)
(111453行,400个字母,每个字母都编码为5个元素向量)
我的标签数据是简单的是/否,因此如果DNA序列有错误,则为[1],否则为[0],因此Y_train.shape is (111453,1)
我正在尝试使用张量流构建一个小型NN。
layer_1_nodes = 5
layer_2_nodes = 10
layer_3_nodes = 5
learning_rate = 0.001
training_epochs = 5
number_of_outputs = 1
# Input Layer
with tf.variable_scope('input'):
X = tf.placeholder(tf.float32, shape=(None, 400, 5), name="X")
# Layer 1
with tf.variable_scope('layer_1'):
weights = tf.get_variable("weights1", shape=[400, 5, layer_1_nodes], initializer=tf.contrib.layers.xavier_initializer())
biases = tf.get_variable(name="biases1", shape=[layer_1_nodes], initializer=tf.zeros_initializer())
layer_1_output = tf.nn.relu(tf.matmul(X, weights) + biases)
# Layer 2
with tf.variable_scope('layer_2'):
weights = tf.get_variable("weights2", shape=[400, layer_1_nodes, layer_2_nodes], initializer=tf.contrib.layers.xavier_initializer())
biases = tf.get_variable(name="biases2", shape=[layer_2_nodes], initializer=tf.zeros_initializer())
layer_2_output = tf.nn.relu(tf.matmul(layer_1_output, weights) + biases)
# Layer 3
with tf.variable_scope('layer_3'):
weights = tf.get_variable("weights3", shape=[400, layer_2_nodes, layer_3_nodes], initializer=tf.contrib.layers.xavier_initializer())
biases = tf.get_variable(name="biases3", shape=[layer_3_nodes], initializer=tf.zeros_initializer())
layer_3_output = tf.nn.relu(tf.matmul(layer_2_output, weights) + biases)
with tf.variable_scope('layer_drop'):
dropout = tf.layers.dropout(
inputs=layer_3_output, rate=0.4)
# Output Layer
with tf.variable_scope('output'):
weights = tf.get_variable("weights4", shape=[400, layer_3_nodes, number_of_outputs], initializer=tf.contrib.layers.xavier_initializer())
biases = tf.get_variable(name="biases4", shape=[number_of_outputs], initializer=tf.zeros_initializer())
prediction = tf.matmul(dropout, weights) + biases
with tf.variable_scope('cost'):
Y = tf.placeholder(tf.float32, shape=(None, 1), name="Y")
cost = tf.reduce_mean(tf.squared_difference(prediction, Y))
但是我总是会遇到关于张量形状的错误。第1层中的第一个matmul或成本squared_difference()(Incompatible shapes: [400,400,1] <- the prediction tensor vs. [111453,1])
尝试使用keras模型:
input_shape = (400, 5, 1)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
但是我无法正确输入形状。