我是一名ABAP程序员,正在学习跟随教程并使用Dat Tran的Racoon数据集(https://github.com/datitran/raccoon_dataset)的tensorflow对象检测API。训练可以在我自己的PC上执行(python 3.6.3和tensorflow 1.5.0),但速度很慢。所以我把它放到谷歌云种植园。工作一直在失败。
训练输入看起来像这样。
"scaleTier": "CUSTOM",
"masterType": "standard_gpu",
"workerType": "standard_gpu",
"parameterServerType": "standard",
"workerCount": "9",
"parameterServerCount": "3",
"packageUris": [
"gs://racoon/train/packages/363569b954c446566b767aabfeb047adb0ed2f25f83248417e2667aac70d0790/object_detection-0.1.tar.gz",
"gs://racoon/train/packages/363569b954c446566b767aabfeb047adb0ed2f25f83248417e2667aac70d0790/slim-0.1.tar.gz"
],
"pythonModule": "object_detection.train",
"args": [
"--train_dir=gs://racoon/train",
"--pipeline_config_path=gs://racoon/data/ssd_mobilenet_v1_pets.config"
],
"region": "us-central1",
"runtimeVersion": "1.5",
"jobDir": "gs://racoon/train",
"pythonVersion": "3.5"
培训执行了近100个步骤,但由于错误而失败,作业日志显示如下。
The replica worker 1 exited with a non-zero status of 1.
Termination reason: Error.
Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 167, in <module> tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run _sys.exit(main(argv))
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 163, in main worker_job_name, is_chief, FLAGS.train_dir)
File "/root/.local/lib/python3.5/site-packages/object_detection/trainer.py", line 360, in train saver=saver)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 758, in train sys.maxint)) AttributeError: module 'sys' has no attribute 'maxint'
The replica worker 2 exited with a non-zero status of 1.
Termination reason: Error.
Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals)
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 167, in <module> tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run _sys.exit(main(argv))
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 163, in main worker_job_name, is_chief, FLAGS.train_dir)
File "/root/.local/lib/python3.5/site-packages/object_detection/trainer.py", line 360, in train saver=saver)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 758, in train sys.maxint)) AttributeError: module 'sys' has no attribute 'maxint'
The replica worker 4 exited with a non-zero status of 1.
Termination reason: Error.
Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals)
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 167, in <module> tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run _sys.exit(main(argv))
File "/root/.local/lib/python3.5/site-packages/object_detection/train.py", line 163, in main worker_job_name, is_chief, FLAGS.train_dir)
File "/root/.local/lib/python3.5/site-packages/object_detection/trainer.py", line 360, in train saver=saver)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 758, in train sys.maxint)) AttributeError: module 'sys' has no attribute 'maxint'
To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=1006195729918&resource=ml_job%2Fjob_id%2Fracoon_object_detection_9&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22racoon_object_detection_9%22
在本地tensorflow安装中,learning.py确实有sys.maxint,IDE显示语法错误。有没有人面临同样的问题,并有解决方案?请与我们分享。 非常感谢你。
答案 0 :(得分:1)
已移除python 3.0 sys.maxint
,因此请将其替换为sys.maxsize
:
删除了sys.maxint常量,因为不再存在限制 到整数的值。但是,sys.maxsize可以用作 大于任何实际列表或字符串索引的整数。它符合 实现的“自然”整数大小通常是相同的 作为同一平台上以前版本中的sys.maxint(假设为 相同的构建选项)。
但这对我来说没有意义,它适用于你的本地机器。
答案 1 :(得分:0)
TensorFlow对象检测API目前仅支持TensorFlow 1.2,因此您需要将运行时版本更改为1.2。