Tensorflow对象检测培训作业在Google云上失败

时间:2017-12-15 05:36:16

标签: linux tensorflow google-cloud-platform object-detection-api

我的Google Storage Bucket采用以下方式:

-data
--labels.pbtxt
--train.record
--test.record
-training
--config file
--packages

我的本​​地机器在/ tensorflow / models / research / object_detection中的数据也是以相同的方式,另外

-training
--cloud.yml

我正在运行以下命令在Google Cloud ML引擎上开始工作

gcloud ml-engine jobs submit training object_detection_0.1 --job-
dir=gs://{BUCKET NAME}/training --packages dist/object_detection-
0.1.tar.gz,slim/dist/slim-0.1.tar.gz --module-name object_detection.train --
region us-central1 --config /##/##/models/research/object_detection/training 
-- --train_dir=gs://{BUCKET NAME}/training --
pipeline_config_path=gs://{BUCKET NAME}/training/config_file.config

Google云日志向我显示以下错误。

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
 File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", 
line 49, in <module>
    from object_detection import trainer
  File "/root/.local/lib/python2.7/site-
 packages/object_detection/trainer.py", line 33, in <module>
    from deployment import model_deploy
ImportError: No module named deployment

replica worker 0,1,2,3 - 同样的错误

The replica worker 4 exited with a non-zero status of 1. Termination reason: 
Error. 
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", 
line 49, in <module>
    from object_detection import trainer
  File "/root/.local/lib/python2.7/site-
packages/object_detection/trainer.py", line 33, in <module>
    from deployment import model_deploy
ImportError: No module named deployment

replica ps 0,1 -same error

 The replica ps 2 exited with a non-zero status of 1. Termination reason: 
Error. 
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", 
line 49, in <module>
    from object_detection import trainer
  File "/root/.local/lib/python2.7/site-
packages/object_detection/trainer.py", line 33, in <module>
    from deployment import model_deploy
ImportError: No module named deployment

1 个答案:

答案 0 :(得分:1)

我对deeplab模型有同样的问题。它们似乎是指this folder,因为如果我放置的话它适用于它应该被正确调用

顺便说一句......我告诉我你是如何解决它的。