Question

我正在使用this article中的代码在Keras中学习机器翻译。本文的代码可以按原样在GPU和CPU上正常工作。

现在，我想利用Google Colab TPU。该代码无法按原样TPU进行分类，我需要朝TF方向移动。

在Fashion MNIST notbook for TPUs之后，我在Tensorflow中使用Keras层，而不是相反。在进入TPU部分之前，我先进行此转换，以查看它是否仍在GPU上运行。这意味着主要从以下位置更改此功能：

from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Embedding
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
# define NMT model
def define_model(src_vocab, tar_vocab, src_timesteps, tar_timesteps, n_units):
    model = Sequential()
    model.add(Embedding(src_vocab, n_units, input_length=src_timesteps, mask_zero=True))
    model.add(LSTM(n_units))
    model.add(RepeatVector(tar_timesteps))
    model.add(LSTM(n_units, return_sequences=True))
    model.add(TimeDistributed(Dense(tar_vocab, activation='softmax')))
    return model

收件人：

import tensorflow as tf
# define NMT model
def define_model(src_vocab, tar_vocab, src_timesteps, tar_timesteps, n_units):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Embedding(src_vocab, n_units, input_length=src_timesteps, mask_zero=True))
    model.add(tf.keras.layers.LSTM(n_units))
    model.add(tf.keras.layers.RepeatVector(tar_timesteps))
    model.add(tf.keras.layers.LSTM(n_units, return_sequences=True))
    model.add(tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(tar_vocab, activation='softmax')))
    return model

然后我做

model = define_model(swh_vocab_size, eng_vocab_size, swh_length, eng_length, 256)
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(trainX, trainY, epochs=1, batch_size=64, validation_data=(testX, testY), callbacks=[checkpoint], verbose=2)

但是，当我跑步时，这会导致投诉：

lib\site-packages\tensorflow\python\ops\gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

然后在GPU内安装期间，它在BLAS负载下失败，如下所示：

InternalError: Blas GEMM launch failed : a.shape=(64, 256), b.shape=(256, 256), m=64, n=256, k=256
     [[{{node lstm/while/MatMul}} = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/lstm/while/strided_slice_grad/StridedSliceGrad"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](lstm/while/TensorArrayReadV3, lstm/while/strided_slice)]]
     [[{{node loss/time_distributed_loss/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch/_175}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2728_...ert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

这是在转换为TPU模型之前。我只是想确保在最终的TPU转换之前，一切仍然可以在CPU和GPU上运行。他们没有。关于为什么我不能走这么远的任何想法？

Answer 1

我认为其中某些原因可能与在Windows上仔细安装Anaconda Python有关。我认为这是正确的顺序（假设您已经安装了CUDA 9.0和cuDNN）：

根据this question.添加路径

安装与用来构建张量流的Visual Studio版本匹配的Visual Studio版本。

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC

到PATH。

和this：在运行Python之前先在脚本中运行vcvarsall。然后：

使用“以管理员身份运行”启动CMD窗口。这很关键。
conda create --name myenv
conda激活myenv
conda安装tensorflow-gpu
conda安装mingw
conda安装libpython
conda安装mkl-service

稍后再进行一些测试后，我会将此标记为正确。第3步和第4步来自this question，其概念是从头开始，严格使用conda install而非this question的pip install。

将代码从keras转换为tf.keras会导致问题

1 个答案: