尝试用 GPU 训练 tf-pose-estimation repo,但它从 CPU

时间:2021-02-06 18:34:57

标签: python tensorflow deep-learning tensorflow1.15

我已尝试使用此 repo https://github.com/ildoonet/tf-pose-estimation 来训练姿势估计系统。我想要这个系统来估计动物的姿势。我已将 animal pose dataset from voc 的 xml 注释转换为 coco 注释。

我有这行代码可以按照 tf-pose-estimation 的建议开始训练;

!python "/content/tf-pose-estimation/tf_pose/train.py" --input-width=368 --input-height=368 --model=cmu --datapath="/content/drive/MyDrive/Dataset" --batchsize=8 --lr=0.001

我在 colab 上运行这个。我的运行时在 GPU 上。它给出了很长的输出,但它会重复自己,所以我很快就会给出它。

2021-02-06 11:30:41.845251: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:*' assigned_device_name_='' resource_device_name_='/device:GPU:*' supported_device_types_=[CPU] possible_devices_=[]
AssignAdd: CPU 
Const: GPU CPU XLA_CPU XLA_GPU 
Identity: GPU CPU XLA_CPU XLA_GPU 
VariableV2: CPU 
Assign: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  Variable (VariableV2) /device:GPU:*
  Variable/Assign (Assign) /device:GPU:*
  Variable/read (Identity) /device:GPU:*
  Adam/value (Const) /device:GPU:*
  Adam (AssignAdd) /device:GPU:*
  save/Assign_420 (Assign) /device:GPU:*

[2021-02-06 11:30:51,287] [train] [INFO] Restore pretrained weights... ./models/numpy/openpose_coco.npy
[2021-02-06 11:32:05,186] [train] [INFO] Restore pretrained weights...Done
[2021-02-06 11:32:05,186] [train] [INFO] prepare file writer
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:195: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

[2021-02-06 11:32:08,083] [train] [INFO] prepare coordinator
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/pose_dataset.py:432: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

[2021-02-06 11:32:08,088] [train] [INFO] Training Started.

我真的需要这方面的帮助。我已经尝试了许多版本的 tensorboard、numpy 和 tensorflow 来获得它。我已经将不同的 cuda 版本下载到 colab。但是我还没解决。

你可以认为它是关于转换注释,但我也在原始coco数据集上尝试过但没有任何变化。但是如果你有任何证据证明注释是问题,请写出来。

我已使用此工具创建注释:https://github.com/roboflow-ai/voc2coco 我只是因为读取 xml 的问题稍微改变了它。

如果您认为问题出在其他地方,这是完整的输出:

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/common.py:138: The name tf.VERSION is deprecated. Please use tf.version.VERSION instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorpack/models/tflayer.py:90: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorpack/callbacks/graph.py:82: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorpack/callbacks/hooks.py:15: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/optimizer.py:15: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py:21: The name tf.train.ChiefSessionCreator is deprecated. Please use tf.compat.v1.train.ChiefSessionCreator instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py:45: The name tf.train.SessionCreator is deprecated. Please use tf.compat.v1.train.SessionCreator instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_pose/mobilenet/mobilenet.py:376: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

[2021-02-06 11:30:22,267] [train] [INFO] define model+
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:63: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

[2021-02-06 11:30:22,270] [pose_dataset] [INFO] dataflow img_path=/data/public/rw/coco/
loading annotations into memory...
Done (t=6.70s)
creating index...
index created!
[2021-02-06 11:30:29,588] [pose_dataset] [INFO] /content/drive/MyDrive/KuzuFAB dataset 82783
[0206 11:30:29 @parallel.py:178] [MultiProcessPrefetchData] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/pose_dataset.py:410: The name tf.FIFOQueue is deprecated. Please use tf.queue.FIFOQueue instead.

Process _Worker-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/parallel.py", line 159, in run
    for dp in self.ds.get_data():
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 274, in get_data
    for dp in self.ds.get_data():
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 274, in get_data
    for dp in self.ds.get_data():
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 274, in get_data
    for dp in self.ds.get_data():
  [Previous line repeated 3 more times]
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 275, in get_data
    ret = self.func(copy(dp))  # shallow copy the list
  File "/content/tf-pose-estimation/tf_pose/pose_dataset.py", line 345, in read_image_url
    img_str = open(meta.img_url, 'rb').read()
FileNotFoundError: [Errno 2] No such file or directory: '/data/public/rw/coco/train2017/COCO_train2014_000000052209.jpg'
[2021-02-06 11:30:29,635] [pose_dataset] [INFO] dataflow img_path=/data/public/rw/coco/
loading annotations into memory...
Process _Worker-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/parallel.py", line 159, in run
    for dp in self.ds.get_data():
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 274, in get_data
    for dp in self.ds.get_data():
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 274, in get_data
    for dp in self.ds.get_data():
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 274, in get_data
    for dp in self.ds.get_data():
  [Previous line repeated 3 more times]
  File "/usr/local/lib/python3.6/dist-packages/tensorpack/dataflow/common.py", line 275, in get_data
    ret = self.func(copy(dp))  # shallow copy the list
  File "/content/tf-pose-estimation/tf_pose/pose_dataset.py", line 345, in read_image_url
    img_str = open(meta.img_url, 'rb').read()
FileNotFoundError: [Errno 2] No such file or directory: '/data/public/rw/coco/train2017/COCO_train2014_000000536290.jpg'
Done (t=3.61s)
creating index...
index created!
[2021-02-06 11:30:33,328] [pose_dataset] [INFO] /content/drive/MyDrive/KuzuFAB dataset 40504
[2021-02-06 11:30:33,486] [train] [DEBUG] tensorboard val image: 12
[2021-02-06 11:30:33,486] [train] [DEBUG] Tensor("fifo_queue_Dequeue:0", shape=(8, 368, 368, 3), dtype=float32, device=/device:CPU:*)
[2021-02-06 11:30:33,486] [train] [DEBUG] Tensor("fifo_queue_Dequeue:1", shape=(8, 46, 46, 19), dtype=float32, device=/device:CPU:*)
[2021-02-06 11:30:33,487] [train] [DEBUG] Tensor("fifo_queue_Dequeue:2", shape=(8, 46, 46, 38), dtype=float32, device=/device:CPU:*)
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:93: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:93: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_pose/network_base.py:61: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_pose/network_base.py:145: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_pose/network_base.py:328: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:127: The name tf.train.cosine_decay is deprecated. Please use tf.compat.v1.train.cosine_decay instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:139: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:141: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

[2021-02-06 11:30:40,066] [train] [INFO] define model-
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:147: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:153: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:161: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:165: The name tf.summary.merge is deprecated. Please use tf.compat.v1.summary.merge instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:167: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:168: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:170: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-02-06 11:30:40.475199: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz
2021-02-06 11:30:40.475768: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6e82fc0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-02-06 11:30:40.475811: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-02-06 11:30:40.479002: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-02-06 11:30:40.656755: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-06 11:30:40.657473: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6e83180 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-02-06 11:30:40.657509: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2021-02-06 11:30:40.658608: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-06 11:30:40.659150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2021-02-06 11:30:40.660778: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-02-06 11:30:40.784657: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-02-06 11:30:40.881881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-02-06 11:30:40.884487: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-02-06 11:30:41.184723: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-02-06 11:30:41.186498: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-02-06 11:30:41.681187: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-02-06 11:30:41.681483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-06 11:30:41.682184: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-06 11:30:41.682679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2021-02-06 11:30:41.682905: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-02-06 11:30:41.684177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-06 11:30:41.684208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2021-02-06 11:30:41.684218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2021-02-06 11:30:41.684704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-06 11:30:41.685530: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-02-06 11:30:41.686162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
[2021-02-06 11:30:41,687] [train] [INFO] model weights initialization
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:172: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

2021-02-06 11:30:41.845251: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
  /job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:*' assigned_device_name_='' resource_device_name_='/device:GPU:*' supported_device_types_=[CPU] possible_devices_=[]
AssignAdd: CPU 
Const: GPU CPU XLA_CPU XLA_GPU 
Identity: GPU CPU XLA_CPU XLA_GPU 
VariableV2: CPU 
Assign: CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  Variable (VariableV2) /device:GPU:*
  Variable/Assign (Assign) /device:GPU:*
  Variable/read (Identity) /device:GPU:*
  Adam/value (Const) /device:GPU:*
  Adam (AssignAdd) /device:GPU:*
  save/Assign_420 (Assign) /device:GPU:*

[2021-02-06 11:30:51,287] [train] [INFO] Restore pretrained weights... ./models/numpy/openpose_coco.npy
[2021-02-06 11:32:05,186] [train] [INFO] Restore pretrained weights...Done
[2021-02-06 11:32:05,186] [train] [INFO] prepare file writer
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/train.py:195: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

[2021-02-06 11:32:08,083] [train] [INFO] prepare coordinator
WARNING:tensorflow:From /content/tf-pose-estimation/tf_pose/pose_dataset.py:432: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

[2021-02-06 11:32:08,088] [train] [INFO] Training Started.

0 个答案:

没有答案