在tf_trt create_inference_graph

时间:2019-04-18 13:13:40

标签: tensorflow python-3.6 tensorrt

我试图按照本[link] [1]中提到的说明,使用ssdLite_mobilenet_V2tensorrt从TensorFlow转换为tf_trt。我收到Aborted (core dumped)错误。真正奇怪的是,我在相同的图形架构上完成了完全相同的事情(使用相同的程序),但是在另一组上进行了训练,并且运行时没有错误。

OS:Ubuntu 18.04.2 GPU:特斯拉M60 TensorFlow 1.13.1

我尝试修改max_batch_size和max_workspace_size_bytes。但是问题似乎不是GPU内存溢出造成的,它似乎从来没有使用超过1.5G的内存。

import tensorflow.contrib.tensorrt as trt
import tensorflow as tf

frozen_graph, input_names, output_names = build_detection_graph(
    config="pipeline.config",
    checkpoint="model.ckpt-75000"
)
with tf.gfile.FastGFile('graph.pb', 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

with open("graph.uff","wb") as f:
    f.write(uff_model.SerializeToString())```

2019-04-18 12:45:50.313642: I tensorflow/contrib/tensorrt/segment/segment.cc:443] There are 169 ops of 35 different types in the graph that are not converted to TensorRT: Range, GreaterEqual, Greater, Split, TopKV2, Select, Less, Slice, Identity, BiasAdd, Reshape, Mul, Fill, Squeeze, Const, Unpack, ResizeBilinear, GatherV2, NonMaxSuppressionV3, Where, ExpandDims, Cast, Minimum, Sum, Sub, Pack, Transpose, Pad, ConcatV2, Exp, Placeholder, Add, Shape, NoOp, StridedSlice, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).
2019-04-18 12:45:51.094322: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 2
2019-04-18 12:45:51.146102: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-04-18 12:46:15.758417: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 275 nodes succeeded.
2019-04-18 12:46:15.801363: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-04-18 12:47:02.994309: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 684 nodes succeeded.
2019-04-18 12:47:03.494635: F tensorflow/core/graph/graph.cc:659] Check failed: inputs[edge->dst_input()] == nullptr Edge {name:'TRTEngineOp_1' id:1323 op device:{} def:{{{node TRTEngineOp_1}} = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,300,300,3]], max_cached_engines_count=10, output_shapes=[[1,576,19,19], [1,1280,10,10], [1,512,5,5], [1,256,3,3], [1,24,3,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_1_native_segment", serialized_segment="\310\265\2...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=11966231, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/stack, ^const6)}}:{name:'TRTEngineOp_0' id:1322 op device:{} def:{{{node TRTEngineOp_0}} = TRTEngineOp[InT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,256,3,3], [1,512,5,5], [1,1280,10,10], [1,576,19,19], [1,24,3,3]], max_cached_engines_count=10, output_shapes=[[1,1917,4], [1,1917,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_0_native_segment", serialized_segment="\360o\021\...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=4810985, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/Relu6, FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/Relu6, FeatureExtractor/MobilenetV2/Conv_1/Relu6, FeatureExtractor/MobilenetV2/expanded_conv_13/expansion_output, BoxPredictor_3/BoxEncodingPredictor/BiasAdd, ^Postprocessor/scale_logits/y, ^BoxPredictor_4/BoxEncodingPredictor/biases/read, ^BoxPredictor_5/BoxEncodingPredictor/biases/read, ^const6)}} with dst_input 0 and had pre-existing input edge {name:'TRTEngineOp_1' id:1323 op device:{} def:{{{node TRTEngineOp_1}} = TRTEngineOp[InT=[DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,300,300,3]], max_cached_engines_count=10, output_shapes=[[1,576,19,19], [1,1280,10,10], [1,512,5,5], [1,256,3,3], [1,24,3,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_1_native_segment", serialized_segment="\310\265\2...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=11966231, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/stack, ^const6)}}:{name:'TRTEngineOp_0' id:1322 op device:{} def:{{{node TRTEngineOp_0}} = TRTEngineOp[InT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], OutT=[DT_FLOAT, DT_FLOAT], cached_engine_batches=[1], calibration_data="", fixed_input_size=true, input_shapes=[[1,256,3,3], [1,512,5,5], [1,1280,10,10], [1,576,19,19], [1,24,3,3]], max_cached_engines_count=10, output_shapes=[[1,1917,4], [1,1917,3]], precision_mode="FP16", segment_funcdef_name="TRTEngineOp_0_native_segment", serialized_segment="\360o\021\...00\000\000", static_engine=true, use_calibration=false, workspace_size_bytes=4810985, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/Relu6, FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/Relu6, FeatureExtractor/MobilenetV2/Conv_1/Relu6, FeatureExtractor/MobilenetV2/expanded_conv_13/expansion_output, BoxPredictor_3/BoxEncodingPredictor/BiasAdd, ^Postprocessor/scale_logits/y, ^BoxPredictor_4/BoxEncodingPredictor/biases/read, ^BoxPredictor_5/BoxEncodingPredictor/biases/read, ^const6)}}
Aborted (core dumped)





  [1]: https://github.com/NVIDIA-AI-IOT/tf_trt_models

1 个答案:

答案 0 :(得分:0)

您能否重试使用此参数8.8.8.8调用create_inference_graph

使用is_dynamic_op=True

来增加张量流日志的详细程度也是很好的

还要检查最新的张量流。您可以从dockerhub每晚尝试使用容器。