我想用keras运行分布式训练(多节点),其中每个节点都有多个GPU。我尝试过类似
model = multi_gpu_model(model, gpus=4)
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])
estimator = model_to_estimator(model, model_dir=args.model_dir)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
它引发以下异常,
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(31, 784), b.shape=(784, 512), m=31, n=512, k=784
[[Node: sequential_1/dense_1_2/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/RMSprop/gradients/sequential_1/dense_1_2/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](lambda_4/Slice, dense_1/kernel/read)]]
[[Node: training/RMSprop/add/_171 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_463_training/RMSprop/add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
是否有办法使此工作或与train_and_evalaute一起使用镜像策略?谢谢。