我正在尝试按照教程(link)使用ray和tensorflow
然后我得到了tune error
:
Result logdir: ray_results/tune_gan_test
Number of trials: 2 ({'ERROR': 2})
ERROR trials:
- train_gan_0_partition=0: ERROR, 1 failures: ray_results/tune_gan_test/train_gan_0_partition=0_2019-04-05_16-25-5536of9abi/error_2019-04-05_16-26-02.txt
- train_gan_1_partition=1: ERROR, 1 failures: ray_results/tune_gan_test/train_gan_1_partition=1_2019-04-05_16-26-1038hprt_a/error_2019-04-05_16-26-12.txt
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/1 GPUs
Memory usage on this node: 53.0/67.5 GB
Result logdir: ray_results/tune_gan_test
Number of trials: 2 ({'ERROR': 2})
ERROR trials:
- train_gan_0_partition=0: ERROR, 1 failures: ray_results/tune_gan_test/train_gan_0_partition=0_2019-04-05_16-25-5536of9abi/error_2019-04-05_16-26-02.txt
- train_gan_1_partition=1: ERROR, 1 failures: ray_results/tune_gan_test/train_gan_1_partition=1_2019-04-05_16-26-1038hprt_a/error_2019-04-05_16-26-12.txt
Traceback (most recent call last):
File "train.py", line 142, in <module>
**gan_spec)
File "/lib/python3.6/site-packages/ray/tune/tune.py", line 253, in run
raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [train_gan_0_partition=0, train_gan_1_partition=1])
与ray使用有关的代码:
# !!! Entrypoint for ray.tune !!!
def train(config={'partition': 0}, reporter=None):
global status_reporter, partition_fn
status_reporter = reporter
partition_fn = config['partition']
tf.app.run(main=main)
# !!! Example of using the ray.tune Python API !!!
if __name__ == "__main__":
try:
register_trainable('train_gan', train)
gan_spec = {
'stop': {
'time_total_s': 600,
},
'config': {
'partition': grid_search([0, 1]),
},
}
ray.init()
tune.run('train_gan',
name='tune_gan_test',
resources_per_trial={"gpu":1},
raise_on_failed_trial=True,
queue_trials=True,
with_server=False,
**gan_spec)
except KeyboardInterrupt:
os._exists(1)
我该如何解决?感谢您的帮助:)