应用错误收集

I have to test the distributed version of tensorflow across multiple gpus.

I run the Cifar-10 multi-gpu example on an AWS g2.8x EC2 instance.

Running time for 2000 steps of the cifar10_multi_gpu_train.py (code here) was 427 seconds with 1 gpu (flag num_gpu=1). Afterwards the eval.py script returned precision @ 1 = 0.537.

With the same example running for the same number of steps (with one step being executed in parallel across all gpus), but using 4 gpus (flag num_gpu=4) running time was about 530 seconds and the eval.py script returned only a slightly higher precision @ 1 of 0.552 (maybe due to randomness in the computation?).

Why is the example performing worse with a higher number of gpus? I have used a very small number of steps for testing purposes and was expecting a much higher gain in precision using 4 gpus. Did I miss something or made some basic mistakes? Did someone else try the above example?

Thank you very much.

Tensorflow. Cifar10 Multi-gpu example performs worse with more gpus

1 个答案: