我正在使用https://github.com/dennybritz/cnn-text-classification-tf并编辑了一些代码供我自己使用。然而,每个时代的训练都太慢(每个时期2分钟),而没有编辑,每个时代只需不到1秒。我有一个numpy数组x_train的形状(209,20000)和numpy数组y_train的形状(209,2)。可能是我的numpy数组太大了吗?
功能train_step是花费太多时间的功能。这条线是: _,步骤,摘要,损失,准确度= sess.run( [train_op,global_step,train_summary_op,cnn.loss,cnn.accuracy], feed_dict)
编辑:为了澄清,我从代码中改变的是使用我自己的预处理数据而不是data_helpers.py。我从2个文本文件中读取大小为20000的字符串,并将它们放入长度为209的列表中。然后,我将数据加载到train.py中,并将列表更改为numpy数组,并在之后重新整形:
x = np.array(x_text)
x = np.reshape(x, (209, 20000))
其余代码应该是相同的。
def train_step(x_batch, y_batch):
"""
A single training step
"""
feed_dict = {
cnn.input_x: x_batch,
cnn.input_y: y_batch,
cnn.dropout_keep_prob: FLAGS.dropout_keep_prob
}
'''The slow part is the line below'''
_, step, summaries, loss, accuracy = sess.run(
[train_op, global_step, train_summary_op, cnn.loss, cnn.accuracy],
feed_dict)
time_str = datetime.datetime.now().isoformat()
print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))
train_summary_writer.add_summary(summaries, step)
# Generate batches
batches = data_helpers.batch_iter(
list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)
# Training loop. For each batch...
for batch in batches:
x_batch, y_batch = zip(*batch)
train_step(x_batch, y_batch)
current_step = tf.train.global_step(sess, global_step)
if current_step % FLAGS.evaluate_every == 0:
print("\nEvaluation:")
dev_step(x_dev, y_dev, writer=dev_summary_writer)
print("")
if current_step % FLAGS.checkpoint_every == 0:
path = saver.save(sess, checkpoint_prefix, global_step=current_step)
print("Saved model checkpoint to {}\n".format(path))