我执行:
gcloud beta ml jobs submit training ${JOB_NAME} --config config.yaml
大约5分钟后,作业出错,出现此错误:
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 232, in <module> tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 228, in main run_training()
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 129, in run_training data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 212, in read_data_sets with open(local_file, 'rb') as f: IOError: [Errno 2] No such file or directory: 'gs://my-bucket/mnist/train/train-images.gz'
奇怪的是,据我所知,该文件存在于该网址。
答案 0 :(得分:1)
此错误通常表示您正在为输出使用多区域GCS存储桶。要避免此错误,您应使用regional GCS bucket。区域存储桶提供了更强的一致性保证,可以避免这些类型的错误。
有关为Cloud ML正确设置GCS存储桶的详细信息,请参阅Cloud ML Docs
答案 1 :(得分:1)
普通IO不知道如何正确处理GCS gs://。你需要:
first_data_file = args.train_files[0]
file_stream = file_io.FileIO(first_data_file, mode='r')
# run experiment
model.run_experiment(file_stream)
但具有讽刺意味的是,您可以将文件从gs:// bucket移动到您的程序可以实际看到的根目录:
with file_io.FileIO(gs://presentation_mplstyle_path, mode='r') as input_f:
with file_io.FileIO('presentation.mplstyle', mode='w+') as output_f:
output_f.write(input_f.read())
mpl.pyplot.style.use(['./presentation.mplstyle'])
最后,将文件从根目录移回gs:// bucket:
with file_io.FileIO(report_name, mode='r') as input_f:
with file_io.FileIO(job_dir + '/' + report_name, mode='w+') as output_f:
output_f.write(input_f.read())
IMO应该更容易。