我将openpose model库中的tf-openpose转换为tensorrt以加快处理速度。
转换成功,如UFF文件所示。输入为image
,输出为Openpose/concat_stage7
,如下所示。然后转换为引擎。
NOTE: UFF has been tested with TensorFlow 1.12.0. Other versions are not guaranteed to work
UFF Version 0.6.3
=== Automatically deduced input nodes ===
[name: "image"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: -1
}
dim {
size: -1
}
dim {
size: -1
}
dim {
size: 3
}
}
}
}
]
=========================================
=== Automatically deduced output nodes ===
[name: "Openpose/concat_stage7"
op: "ConcatV2"
input: "Mconv7_stage6_L2/BiasAdd"
input: "Mconv7_stage6_L1/BiasAdd"
input: "Openpose/concat_stage7/axis"
attr {
key: "N"
value {
i: 2
}
}
attr {
key: "T"
value {
type: DT_FLOAT
}
}
attr {
key: "Tidx"
value {
type: DT_INT32
}
}
]
==========================================
Using output node Openpose/concat_stage7
Converting to UFF graph
No. nodes: 463
UFF Output written to cmu/cmu_openpose.uff
引擎反序列化,并且推断为
def infer(engine, x, batch_size, context):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
img = np.array(x).ravel()
np.copyto(inputs[0].host, 1.0 - img / 255.0)
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU.
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
# Synchronize the stream
stream.synchronize()
# Return only the host outputs.
我需要获取heatMat and pafMat from output
作为原始的tf_pose处理。
tf_pose处理具有
self.tensor_image = self.graph.get_tensor_by_name('TfPoseEstimator/image:0')
self.tensor_output = self.graph.get_tensor_by_name('TfPoseEstimator/Openpose/concat_stage7:0')
self.tensor_heatMat = self.tensor_output[:, :, :, :19]
self.tensor_pafMat = self.tensor_output[:, :, :, 19:]
如何从Tensorrt处理输出中获取heatMat和pafMat?