'没有这样的文件或目录'提交培训工作后出错

时间:2016-09-29 16:21:47

标签: google-cloud-ml

我执行:

gcloud beta ml jobs submit training ${JOB_NAME} --config config.yaml

大约5分钟后,作业出错,出现此错误:

Traceback (most recent call last): 
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) 
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 232, in <module> tf.app.run() 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 228, in main run_training() 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 129, in run_training data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data) 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 212, in read_data_sets with open(local_file, 'rb') as f: IOError: [Errno 2] No such file or directory: 'gs://my-bucket/mnist/train/train-images.gz'

奇怪的是,据我所知,该文件存在于该网址。

2 个答案:

答案 0 :(得分:1)

此错误通常表示您正在为输出使用多区域GCS存储桶。要避免此错误,您应使用regional GCS bucket。区域存储桶提供了更强的一致性保证,可以避免这些类型的错误。

有关为Cloud ML正确设置GCS存储桶的详细信息,请参阅Cloud ML Docs

答案 1 :(得分:1)

普通IO不知道如何正确处理GCS gs://。你需要:

first_data_file = args.train_files[0]
file_stream = file_io.FileIO(first_data_file, mode='r')

# run experiment
model.run_experiment(file_stream)

但具有讽刺意味的是,您可以将文件从gs:// bucket移动到您的程序可以实际看到的根目录:

with file_io.FileIO(gs://presentation_mplstyle_path, mode='r') as input_f:
    with file_io.FileIO('presentation.mplstyle', mode='w+') as output_f:
        output_f.write(input_f.read())

mpl.pyplot.style.use(['./presentation.mplstyle'])

最后,将文件从根目录移回gs:// bucket:

with file_io.FileIO(report_name, mode='r') as input_f:
    with file_io.FileIO(job_dir + '/' + report_name, mode='w+') as output_f:
        output_f.write(input_f.read())

IMO应该更容易。