Question

我将openpose model库中的tf-openpose转换为tensorrt以加快处理速度。转换成功，如UFF文件所示。输入为image，输出为Openpose/concat_stage7，如下所示。然后转换为引擎。

NOTE: UFF has been tested with TensorFlow 1.12.0. Other versions are not guaranteed to work
UFF Version 0.6.3
=== Automatically deduced input nodes ===
[name: "image"
op: "Placeholder"
attr {
  key: "dtype"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "shape"
  value {
    shape {
      dim {
        size: -1
      }
      dim {
        size: -1
      }
      dim {
        size: -1
      }
      dim {
        size: 3
      }
    }
  }
}
]
=========================================

=== Automatically deduced output nodes ===
[name: "Openpose/concat_stage7"
op: "ConcatV2"
input: "Mconv7_stage6_L2/BiasAdd"
input: "Mconv7_stage6_L1/BiasAdd"
input: "Openpose/concat_stage7/axis"
attr {
  key: "N"
  value {
    i: 2
  }
}
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "Tidx"
  value {
    type: DT_INT32
  }
}
]
==========================================

Using output node Openpose/concat_stage7
Converting to UFF graph
No. nodes: 463
UFF Output written to cmu/cmu_openpose.uff

引擎反序列化，并且推断为

def infer(engine, x, batch_size, context):  
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()
    for binding in engine:
        size = trt.volume(engine.get_binding_shape(binding)) * batch_size
        dtype = trt.nptype(engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))
    img = np.array(x).ravel()
    np.copyto(inputs[0].host, 1.0 - img / 255.0)  
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)    
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.

我需要获取heatMat and pafMat from output作为原始的tf_pose处理。 tf_pose处理具有

self.tensor_image = self.graph.get_tensor_by_name('TfPoseEstimator/image:0')
self.tensor_output = self.graph.get_tensor_by_name('TfPoseEstimator/Openpose/concat_stage7:0')
self.tensor_heatMat = self.tensor_output[:, :, :, :19]
self.tensor_pafMat = self.tensor_output[:, :, :, 19:]

如何从Tensorrt处理输出中获取heatMat和pafMat？

在TensorRT处理后如何提取正确的输出？

0 个答案: