我试图根据Tensorflow样本和this post训练自己的Detector模型。我确实在Macbook Pro上进行了本地培训。问题是我没有GPU并且在CPU上执行它太慢(每次迭代大约25秒)。
这样,我尝试在tutorial之后运行Google Cloud ML Engine,但我无法正常运行。
我的文件夹结构如下所述:
+ data
- train.record
- test.record
+ models
+ train
+ eval
+ training
- ssd_mobilenet_v1_coco
我从本地培训改为Google云培训的步骤是:
pipeline.config
文件并将所有路径从Users/dev/detector/
更改为gcc://bucketname/
; 运行
gcloud ml-engine工作提交培训object_detection _ date +%s
\
--job-dir = gs:// bucketname / models / train \
--packages dist / object_detection-0.1.tar.gz,slim / dist / slim-0.1.tar.gz \
--module-name object_detection.train \
--region us-east1 \
--config /Users/dev/detector/training/cloud.yml \
-
--train_dir = gs:// bucketname / models / train \
--pipeline_config_path = GS://bucketname/data/pipeline.config
这样做,从MLUnits给我以下错误消息:
副本ps 0以非零状态1退出。终止原因:错误。回溯(最近一次调用最后一次):文件" /usr/lib/python2.7/runpy.py",第162行,在_run_module_as_main" __ main __",fname,loader,pkg_name)文件" /usr/lib/python2.7/runpy.py",第72行,在run_globals文件中的_run_code exec代码" /root/.local/lib/python2.7/site-packages/ object_detection / train.py",第49行,来自object_detection导入培训师文件" /root/.local/lib/python2.7/site-packages/object_detection/trainer.py" ;,第27行,从object_detection.builders导入preprocessor_builder文件" /root/.local/lib/python2.7/site-packages/object_detection/builders/preprocessor_builder.py" ;,第21行,从object_detection.protos导入preprocessor_pb2文件" /root/.local/lib/python2.7/site-packages/object_detection/protos/preprocessor_pb2.py" ;,第71行,in options = None,file = DESCRIPTOR),TypeError:__ new __()got意外的关键字参数'文件
提前致谢。
答案 0 :(得分:0)
检查andersskog发布的here解决方案。它对我有用。我做了一个补丁here。如需手动修复,请按照以下说明操作:
确保您的yaml版本为1.4,例如:
trainingInput:
runtimeVersion: "1.4"
scaleTier: CUSTOM
masterType: standard_gpu
workerCount: 5
workerType: standard_gpu
parameterServerCount: 3
parameterServerType: standard
将setup.py更改为以下内容:
"""Setup script for object_detection."""
import logging
import subprocess
from setuptools import find_packages
from setuptools import setup
from setuptools.command.install import install
class CustomCommands(install):
def RunCustomCommand(self, command_list):
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
stdout_data, _ = p.communicate()
logging.info('Log command output: %s', stdout_data)
if p.returncode != 0:
raise RuntimeError('Command %s failed: exit code: %s' %
(command_list, p.returncode))
def run(self):
self.RunCustomCommand(['apt-get', 'update'])
self.RunCustomCommand(
['apt-get', 'install', '-y', 'python-tk'])
install.run(self)
REQUIRED_PACKAGES = ['Pillow>=1.0', 'protobuf>=3.3.0', 'Matplotlib>=2.1']
setup(
name='object_detection',
version='0.1',
install_requires=REQUIRED_PACKAGES,
include_package_data=True,
packages=[p for p in find_packages() if p.startswith('object_detection')],
description='Tensorflow Object Detection Library',
cmdclass={
'install': CustomCommands,
}
)
在object_detection / utils / visualization_utils.py中,第24行(在导入matplotlib.pyplot作为plt之前)添加:
import matplotlib
matplotlib.use('agg')
在object_detection / evaluator.py的第184行中,更改
tf.train.get_or_create_global_step()
到
tf.contrib.framework.get_or_create_global_step()
最后,在object_detection / builders / optimizer_builder.py的第103行中,更改
tf.train.get_or_create_global_step()
到
tf.contrib.framework.get_or_create_global_step()
希望这有帮助!
答案 1 :(得分:0)
问题是protobuf版本。你可能已经通过brew安装了最新的protoc;自3.5.0版以来,protobuf添加了file
字段https://github.com/google/protobuf/blob/9f80df026933901883da1d556b38292e14836612/CHANGES.txt#L74
因此,在上述更改中,REQUIRED_PACKAGES
将protobuf版本设置为'protobuf>=3.5.1'