我正试图从Sewak等人的“实用卷积神经网络”一书中的代码样本运行演示卷积神经网络。人。这是一个使用Tensorflow的简单狗/猫分类器。问题是我在Jupyter笔记本中运行这个Tensorflow代码,当我执行代码开始训练网络时,内核一直在死。我不确定这是否是笔记本电脑的问题,或者是否在演示代码中缺少某些内容,或者这是一个已知问题而且我不应该使用jupyter笔记本进行培训?
因此,让我提供有关环境的一些细节。我有一个Docker容器,它安装了Tensorflow GPU,Keras和其他CUDA库。我的电脑上有3个GPU。在容器内部安装了Miniconda,
,因此我可以加载和运行笔记本等。
以下是我的一些想法,这可能导致笔记本Python 3.6内核死亡。
我对Tensorflow不太熟悉,但还不知道问题的根源。由于代码在容器内运行,因此通常的调试工具会受到更多限制。
完整的培训代码位于github存储库:https://github.com/PacktPublishing/Practical-Convolutional-Neural-Networks/blob/master/Chapter03/Dog_cat_classification/CNN_DogvsCat_Classifier.py
以下是用于培训的optimize
函数。现在确定是否有人可以看到某些特定功能丢失。
def optimize(num_iterations):
# Ensure we update the global variable rather than a local copy.
global total_iterations
# Start-time used for printing time-usage below.
start_time = time.time()
best_val_loss = float("inf")
patience = 0
for i in range(total_iterations, total_iterations + num_iterations):
# Get a batch of training examples.
# x_batch now holds a batch of images and
# y_true_batch are the true labels for those images.
x_batch, y_true_batch, _, cls_batch = data.train.next_batch(train_batch_size)
x_valid_batch, y_valid_batch, _, valid_cls_batch = data.valid.next_batch(train_batch_size)
# Convert shape from [num examples, rows, columns, depth]
# to [num examples, flattened image shape]
x_batch = x_batch.reshape(train_batch_size, img_size_flat)
x_valid_batch = x_valid_batch.reshape(train_batch_size, img_size_flat)
# Put the batch into a dict with the proper names
# for placeholder variables in the TensorFlow graph.
feed_dict_train = {x: x_batch, y_true: y_true_batch}
feed_dict_validate = {x: x_valid_batch, y_true: y_valid_batch}
# Run the optimizer using this batch of training data.
# TensorFlow assigns the variables in feed_dict_train
# to the placeholder variables and then runs the optimizer.
session.run(optimizer, feed_dict=feed_dict_train)
# Print status at end of each epoch (defined as full pass through training Preprocessor).
if i % int(data.train.num_examples/batch_size) == 0:
val_loss = session.run(cost, feed_dict=feed_dict_validate)
epoch = int(i / int(data.train.num_examples/batch_size))
acc, val_acc = print_progress(epoch, feed_dict_train, feed_dict_validate, val_loss)
msg = "Epoch {0} --- Training Accuracy: {1:>6.1%}, Validation Accuracy: {2:>6.1%}, Validation Loss: {3:.3f}"
print(msg.format(epoch + 1, acc, val_acc, val_loss))
print(acc)
acc_list.append(acc)
val_acc_list.append(val_acc)
iter_list.append(epoch+1)
if early_stopping:
if val_loss < best_val_loss:
best_val_loss = val_loss
patience = 0
else:
patience += 1
if patience == early_stopping:
break
# Update the total number of iterations performed.
total_iterations += num_iterations
# Ending time.
end_time = time.time()
# Difference between start and end-times.
time_dif = end_time - start_time
# Print the time-usage.
print("Time elapsed: " + str(timedelta(seconds=int(round(time_dif)))))