我试图重现代码的多gpu版本,而ResNet的模型体系结构几乎没有变化(其余相同),如此处https://github.com/FlyEgle/keras-yolo3所示。在train_height_point.py下。 直接链接:https://github.com/FlyEgle/keras-yolo3/blob/master/train_height_point.py
错误似乎在Yolo_loss函数中
我尝试修改while_loop和其他stackoverflow解决方案中提到的其他技巧 Gradients error using TensorArray Tensorflow TensorArray TensorArray_1_0: Could not read from TensorArray index 0 because it has not yet been written to https://github.com/tensorflow/tensorflow/issues/3663
运行代码时,在第一个时期出现以下错误
Train on 62880 samples, val on 6976 samples, with batch size 1.
Epoch 1/400
2019-06-28 18:39:30.247036: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:30.251868: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_1_4: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:30.251942: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_2_5: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-28 18:39:31.368047: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "train.py", line 517, in <module>
_main()
File "train.py", line 177, in _main
callbacks=[logging, lr_schedule, checkpoint]
File "/opt/conda/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
[[{{node replica_0/model_3/yolo_loss/TensorArrayStack/TensorArrayGatherV3}}]]
[[{{node loss/add_20}}]]
答案 0 :(得分:0)
根据上面的stacktrace,您需要传递一个名为element_shape
的参数,其定义完全像element_shape(10, 10, 10)
而不是None
或element_shape=(None, 10, 10)
。似乎不可能存在未知维度。
我也有这个问题,并尝试找到一种更好的方法来解决它。