Question

我尝试在云端ml（谷歌云平台）上使用tensorflow后端运行Keras。我发现keras似乎没有使用GPU。在我的CPU上运行一个纪元的性能是190秒，并且与我在转储的日志中看到的相同。有没有办法确定代码是在GPU中运行还是在keras中运行？有没有人尝试使用Tensor流后端运行的云ML上的Keras？

Answer 1

更新：截至2017年3月，GPU已公开发布。见Fuyang Liu's answer

~~CloudML目前无法使用GPU。但是，它们将在未来几个月内出现。~~

Answer 2

是的，现在支持它。

基本上您需要在模块中添加cloudml-gpu.yaml等文件，其中包含以下内容：

trainingInput:
  scaleTier: CUSTOM
  # standard_gpu provides 1 GPU. Change to complex_model_m_gpu for 4 
GPUs
  masterType: standard_gpu
  runtimeVersion: "1.0"

然后添加一个名为--config=trainer/cloudml-gpu.yaml的选项（假设您的训练模块位于名为trainer的文件夹中）。例如：

export BUCKET_NAME=tf-learn-simple-sentiment
export JOB_NAME="example_5_train_$(date +%Y%m%d_%H%M%S)"
export JOB_DIR=gs://$BUCKET_NAME/$JOB_NAME
export REGION=europe-west1

gcloud ml-engine jobs submit training $JOB_NAME \
  --job-dir gs://$BUCKET_NAME/$JOB_NAME \
  --runtime-version 1.0 \
  --module-name trainer.example5-keras \
  --package-path ./trainer \
  --region $REGION \
  --config=trainer/cloudml-gpu.yaml \
  -- \
  --train-file gs://tf-learn-simple-sentiment/sentiment_set.pickle

您可能还想查看this url以获取GPU可用区域及其他信息。

Answer 3

import keras.backend.tensorflow_backend as K
K._set_session(K.tf.Session(config=K.tf.ConfigProto(log_device_placement=True)))

应该让keras将每个张量的设备位置打印到stdout或stderr。

Google Cloud ML上的Keras似乎没有使用GPU？是否可以使它工作？

3 个答案: