在Google Cloud-ML上运行tensorflow之前使用apt-get安装python-tk

时间:2017-11-10 06:19:01

标签: tensorflow google-cloud-platform google-cloud-ml

我正在通过Cloud-VM实例使用Cloud Machine Learning Engine开发对象检测器。遵循教程(https://cloud.google.com/blog/big-data/2017/06/training-an-object-detector-using-cloud-machine-learning-engine)。

当我提交以下培训工作时,我在Google Cloud Platform上收到了模块导入错误:

gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
    --job-dir=${YOUR_GCS_BUCKET}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.train \
    --region us-central1 \
    --config object_detection/samples/cloud/cloud.yml \
    -- \
    --train_dir=${YOUR_GCS_BUCKET}/train \
    --pipeline_config_path=${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_coco.config

错误如下:

...object_detection/utils/visualization_utils.py", line 24, in <module>
import matplotlib.pyplot as plt
ImportError: No module named matplotlib.pyplot

我已经使用pip install安装了matplotlib。这段代码很好用python2.7 -c&#39;导入matplotlib.pyplot作为plt&#39;。

通过在setup.py程序文件中的REQUIRED_PACKAGES列表中添加包名称来解决matplotlib错误。

另外,请参阅我的setup.py文件..

"""Setup script for object_detection."""

from setuptools import find_packages
from setuptools import setup
import subprocess

subprocess.check_call(['apt-get', 'update'])
subprocess.check_call(['apt-get', 'install', 'python-tk'])

REQUIRED_PACKAGES = ['Pillow>=1.0', 'matplotlib']

setup(
    name='object_detection',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    include_package_data=True,
    packages=[p for p in find_packages() if p.startswith('object_detection')],
    description='Tensorflow Object Detection Library',
)

但是,即使解决了这个问题,在这种情况下还会出现一些其他错误,因为matplotlib依赖于python-tk包。

ps-replica-0   Could not find a version that satisfies the requirement python-tk (from object-detection==0.1) (from versions: ) ps-replica-0 
ps-replica-0 No matching distribution found for python-tk (from object-detection==0.1) ps-replica-0 
ps-replica-0 Command '['pip', 'install', '--user', u'object_detection-0.1.tar.gz']' returned non-zero exit status 1 ps-replica-0 
ps-replica-0 Module completed; cleaning up. ps-replica-0 

但pip包中没有python-tk / python3-tk。为了做到这一点,我们需要这样做     sudo apt-get install python-tk 要么     sudo apt-get install python3-tk

Google Cloud-ML运行python 2.7。因此,我们需要在运行tensorflow培训程序之前安装python-tk。

现在,有人可以帮助我,以便在运行tensorflow之前命令Cloud ML使用apt-get安装python-tk。

Update_01 :*

我得到了另一组错误。它似乎是由python setup.py egg_info失败引起的。 还有,这就是..

Command '['apt-get', 'install', 'python-tk']' returned non-zero exit status 1

错误日志如下所示。 在此先感谢您的帮助。

ps-replica-2
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-BhSDtP-build/
ps-replica-2
Command '['pip', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', u'object_detection-0.1.tar.gz']' returned non-zero exit status 1
The replica ps 0 exited with a non-zero status of 1. Termination reason: Error. 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-C3hdCp-build/setup.py", line 8, in <module>
subprocess.check_call(['apt-get', 'install', 'python-tk'])
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['apt-get', 'install', 'python-tk']' returned non-zero exit status 1

The replica ps 1 exited with a non-zero status of 1. Termination reason: Error. 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-iR0TqP-build/setup.py", line 8, in <module>
subprocess.check_call(['apt-get', 'install', 'python-tk'])
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['apt-get', 'install', 'python-tk']' returned non-zero exit status 1

The replica ps 2 exited with a non-zero status of 1. Termination reason: Error. 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-BhSDtP-build/setup.py", line 8, in <module>
subprocess.check_call(['apt-get', 'install', 'python-tk'])
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['apt-get', 'install', 'python-tk']' returned non-zero exit status 1

To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=640992742297&resource=ml_job%2Fjob_id%2Froot_object_detection_1510462119&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22root_object_detection_1510462119%22" 

职位提交代码:

gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
    --job-dir=${YOUR_GCS_BUCKET}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.train \
    --config object_detection/samples/cloud/cloud.yml \
    -- \
    --train_dir=${YOUR_GCS_BUCKET}/train \
    --pipeline_config_path=${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_coco.config

提前致谢..

Update_02:解决方案 感谢 @Dennis Liu 的解决方案。无需安装python-tk包。 除此之外,还会有一个错误,可以通过tf.train.get_or_create_global_step()中第103行将tf.contrib.framework.get_or_create_global_step()更改为object_detection/builders/optimizer_builder.py来解决。 Solution Link

2 个答案:

答案 0 :(得分:1)

使用matplotlib.use('agg') 在导入matplotlib后立即

我将matplotlib的后端从python-tk更改为agg,这就是诀窍。以下是我在以下发现的答案:

https://stackoverflow.com/a/47077614

答案 1 :(得分:0)

将以下行添加到setup.py:

import subprocess
subprocess.check_call(['apt-get', 'install', 'python-tk'])

并从python-tk移除REQUIRED_PACKAGES