Tensorflow Saver.save无法写入docker共享卷

时间:2018-02-20 15:08:02

标签: python docker tensorflow autoencoder

我有一个带有Autoencoder的docker容器,可以通过Flask-Server启动。所有脚本都被复制到Docker的/ root中,它还可以访问共享卷/数据,如下所示:

/数据
- /图像
- /型号
- / Autoenc.exe.ckpt.data-00000-的-00001
- / Autoenc.exe.ckpt.index
- / Autoenc.exe.ckpt.meta
- /检查点


/根
-MyServer.py

服务器可以成功将图像写入/ data / images文件夹,但无法写入/ data / models目录。 我像这样实例化了tensorflow Saver:

saver = tf.train.Saver()

并尝试了以下每种编写保存文件的方法

saver.save(sess, '/data/models/Autoenc.exe.ckpt')
saver.save(sess, '../data/models/Autoenc.exe.ckpt')

有趣的事实:当我这样做时,它可以正常运行

saver.save(sess, './Autoenc.exe.ckpt')

但这会将文件写入错误的位置,在重建docker容器时将删除该位置。 构建docker容器并且已经在提到的目录中提供检查点时,通过

进行恢复
saver.restore(sess, "../data/models/Autoenc.exe.ckpt")

没有任何问题..

不要让我告诉你错误信息:

2018-02-20 15:00:52.868566: W tensorflow/core/framework/op_kernel.cc:1198] Unknown: ../data/models/Autoenc.exe.ckpt.data-00000-of-00001.tempstate17405837231896083449; Input/output error
2018-02-20 15:00:53.339357: W tensorflow/core/kernels/queue_base.cc:277] _0_input_producer: Skipping cancelled enqueue attempt with queue not closed
[2018-02-20 15:00:53,590] ERROR in app: Exception on /train/ [POST]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
    return fn(*args)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
    status, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: ../data/models/Autoenc.exe.ckpt.data-00000-of-00001.tempstate17405837231896083449; Input/output error
         [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Variable, Variable/Adam, Variable/Adam_1, Variable_1, Variable_1/Adam, Variable_1/Adam_1, Variable_10, Variable_10/Adam, Variable_10/Adam_1, Variable_11, Variable_11/Adam, Variable_11/Adam_1, Variable_12, Variable_12/Adam, Variable_12/Adam_1, Variable_13, Variable_13/Adam, Variable_13/Adam_1, Variable_14, Variable_14/Adam, Variable_14/Adam_1, Variable_15, Variable_15/Adam, Variable_15/Adam_1, Variable_2, Variable_2/Adam, Variable_2/Adam_1, Variable_3, Variable_3/Adam, Variable_3/Adam_1, Variable_4, Variable_4/Adam, Variable_4/Adam_1, Variable_5, Variable_5/Adam, Variable_5/Adam_1, Variable_6, Variable_6/Adam, Variable_6/Adam_1, Variable_7, Variable_7/Adam, Variable_7/Adam_1, Variable_8, Variable_8/Adam, Variable_8/Adam_1, Variable_9, Variable_9/Adam, Variable_9/Adam_1, beta1_power, beta2_power)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/conda/lib/python3.6/site-packages/flask_restful/__init__.py", line 480, in wrapper
    resp = resource(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/flask/views.py", line 84, in view
    return self.dispatch_request(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/flask_restful/__init__.py", line 595, in dispatch_request
    resp = meth(*args, **kwargs)
  File "Server.py", line 140, in post
    auto.Do_Autoenc()
  File "/root/dense_autoencoder.py", line 163, in Do_Autoenc
    saver.save(sess, '../data/models/Autoenc.exe.ckpt')
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1593, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
    options, run_metadata)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: ../data/models/Autoenc.exe.ckpt.data-00000-of-00001.tempstate17405837231896083449; Input/output error
         [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Variable, Variable/Adam, Variable/Adam_1, Variable_1, Variable_1/Adam, Variable_1/Adam_1, Variable_10, Variable_10/Adam, Variable_10/Adam_1, Variable_11, Variable_11/Adam, Variable_11/Adam_1, Variable_12, Variable_12/Adam, Variable_12/Adam_1, Variable_13, Variable_13/Adam, Variable_13/Adam_1, Variable_14, Variable_14/Adam, Variable_14/Adam_1, Variable_15, Variable_15/Adam, Variable_15/Adam_1, Variable_2, Variable_2/Adam, Variable_2/Adam_1, Variable_3, Variable_3/Adam, Variable_3/Adam_1, Variable_4, Variable_4/Adam, Variable_4/Adam_1, Variable_5, Variable_5/Adam, Variable_5/Adam_1, Variable_6, Variable_6/Adam, Variable_6/Adam_1, Variable_7, Variable_7/Adam, Variable_7/Adam_1, Variable_8, Variable_8/Adam, Variable_8/Adam_1, Variable_9, Variable_9/Adam, Variable_9/Adam_1, beta1_power, beta2_power)]]

Caused by op 'save/SaveV2', defined at:
  File "Server.py", line 212, in <module>
    app.run(host = '0.0.0.0')
  File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 841, in run
    run_simple(host, port, self, **options)
  File "/opt/conda/lib/python3.6/site-packages/werkzeug/serving.py", line 739, in run_simple
    inner()
  File "/opt/conda/lib/python3.6/site-packages/werkzeug/serving.py", line 702, in inner
    srv.serve_forever()
  File "/opt/conda/lib/python3.6/site-packages/werkzeug/serving.py", line 539, in serve_forever
    HTTPServer.serve_forever(self)
  File "/opt/conda/lib/python3.6/socketserver.py", line 238, in serve_forever
    self._handle_request_noblock()
  File "/opt/conda/lib/python3.6/socketserver.py", line 317, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/opt/conda/lib/python3.6/socketserver.py", line 348, in process_request
    self.finish_request(request, client_address)
  File "/opt/conda/lib/python3.6/socketserver.py", line 361, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/opt/conda/lib/python3.6/socketserver.py", line 696, in __init__
    self.handle()
  File "/opt/conda/lib/python3.6/site-packages/werkzeug/serving.py", line 232, in handle
    rv = BaseHTTPRequestHandler.handle(self)
  File "/opt/conda/lib/python3.6/http/server.py", line 418, in handle
    self.handle_one_request()
  File "/opt/conda/lib/python3.6/site-packages/werkzeug/serving.py", line 267, in handle_one_request
    return self.run_wsgi()
  File "/opt/conda/lib/python3.6/site-packages/werkzeug/serving.py", line 209, in run_wsgi
    execute(self.server.app)
  File "/opt/conda/lib/python3.6/site-packages/werkzeug/serving.py", line 197, in execute
    application_iter = app(environ, start_response)
  File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/conda/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/conda/lib/python3.6/site-packages/flask_restful/__init__.py", line 480, in wrapper
    resp = resource(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/flask/views.py", line 84, in view
    return self.dispatch_request(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/flask_restful/__init__.py", line 595, in dispatch_request
    resp = meth(*args, **kwargs)
  File "Server.py", line 140, in post
    auto.Do_Autoenc()
  File "/root/dense_autoencoder.py", line 139, in Do_Autoenc
    saver = tf.train.Saver()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1239, in __init__
    self.build()
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1248, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1284, in _build
    build_save=build_save, build_restore=build_restore)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 762, in _build_internal
    save_tensor = self._AddSaveOps(filename_tensor, saveables)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 297, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 240, in save_op
    tensors)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1174, in save_v2
    shape_and_slices=shape_and_slices, tensors=tensors, name=name)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

UnknownError (see above for traceback): ../data/models/Autoenc.exe.ckpt.data-00000-of-00001.tempstate17405837231896083449; Input/output error
         [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Variable, Variable/Adam, Variable/Adam_1, Variable_1, Variable_1/Adam, Variable_1/Adam_1, Variable_10, Variable_10/Adam, Variable_10/Adam_1, Variable_11, Variable_11/Adam, Variable_11/Adam_1, Variable_12, Variable_12/Adam, Variable_12/Adam_1, Variable_13, Variable_13/Adam, Variable_13/Adam_1, Variable_14, Variable_14/Adam, Variable_14/Adam_1, Variable_15, Variable_15/Adam, Variable_15/Adam_1, Variable_2, Variable_2/Adam, Variable_2/Adam_1, Variable_3, Variable_3/Adam, Variable_3/Adam_1, Variable_4, Variable_4/Adam, Variable_4/Adam_1, Variable_5, Variable_5/Adam, Variable_5/Adam_1, Variable_6, Variable_6/Adam, Variable_6/Adam_1, Variable_7, Variable_7/Adam, Variable_7/Adam_1, Variable_8, Variable_8/Adam, Variable_8/Adam_1, Variable_9, Variable_9/Adam, Variable_9/Adam_1, beta1_power, beta2_power)]]

如果需要更多信息,请随时提出进一步的问题。 感谢您的帮助,因为我开始失去理智。

1 个答案:

答案 0 :(得分:0)

我认为,它可以对目录'/ data / models /'进行权限。 请检查运行 saver 进程的容器用户。

要测试,docker exec -it bash并尝试在目录'/ data / models /'中创建一个文件。