Question

我使用NCHW数据格式在我的GPU上训练了一个小型CNN，现在我想导出一个.pb文件，然后我可以用它来在其他应用程序中进行推理。

我编写了一个小辅助函数来调用Tensorflow的freeze_graph函数和默认值，给定一个包含检查点文件和graph.pbtxt的目录：

import os
import argparse
#os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
dir(tf.contrib) #fix for tf.contrib undefined ops bug
from tensorflow.python.tools.freeze_graph import freeze_graph 

def my_freeze_graph_2(model_dir, output_node_names):
"""Extract the sub graph defined by the output nodes and convert 
  all its variables into constant 
  Args:
      model_dir: the root folder containing the checkpoint state file
      output_node_names: a string, containing all the output node's names, 
                          comma separated
"""
if not tf.gfile.Exists(model_dir):
    raise AssertionError(
        "Export directory doesn't exists. Please specify an export "
        "directory: %s" % model_dir)

if not output_node_names:
    print("You need to supply the name of a node to --output_node_names.")
    return -1

# We retrieve our checkpoint fullpath
checkpoint = tf.train.get_checkpoint_state(model_dir)
input_checkpoint = checkpoint.model_checkpoint_path

# We precise the file fullname of our freezed graph
absolute_model_dir = os.path.abspath(model_dir)
output_graph = os.path.join(absolute_model_dir, "frozen_model.pb")

freeze_graph(input_graph=os.path.join(model_dir, 'graph.pbtxt'),
             input_saver='',
             input_binary=False,
             input_checkpoint=input_checkpoint,
             output_node_names=output_node_names,
             restore_op_name="save/restore_all",
             filename_tensor_name="save/Const:0",
             output_graph=output_graph,
             clear_devices=True,
             initializer_nodes='')

然后我有一个小脚本试图从frozen_model.pb构建图形来测试冻结是否真的有效：

import os
#os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import argparse
import tensorflow as tf
from freeze_graph import load_graph
import cv2

if __name__ == '__main__':
    # Let's allow the user to pass the filename as an argument
    parser = argparse.ArgumentParser()
parser.add_argument("--frozen_model_filename", default="model-multiple_starts/frozen_model.pb", type=str, help="Frozen model file to import")
args = parser.parse_args()

# We use our "load_graph" function
graph = load_graph(args.frozen_model_filename)

# We can verify that we can access the list of operations in the graph
for op in graph.get_operations():
    print(op.name)

# We access the input and output nodes 
x = graph.get_tensor_by_name('prefix/Reshape:0')
y = graph.get_tensor_by_name('prefix/softmax_tensor:0')

# We launch a Session
with tf.Session(graph=graph, config=tf.ConfigProto(log_device_placement=True)) as sess:
    # Note: we don't nee to initialize/restore anything
    # There is no Variables in this graph, only hardcoded constants 

    # Load an image to use as test
    im = cv2.imread('57_00000000.png', cv2.IMREAD_GRAYSCALE)
    im = im.T
    im = im / 255 - 0.5
    im = im[None,:,:,None]


    y_out = sess.run(y, feed_dict={
        x: im 
    })
    print(y_out)

如果我尝试运行测试脚本，则会收到以下错误：

InvalidArgumentError：CPU BiasOp仅支持NHWC。 [[节点： prefix / conv2d / BiasAdd = BiasAdd [T = DT_FLOAT，data_format =＆＃34; NCHW＆＃34;， _device =＆＃34; /作业：本地主机/复制：0 /任务：0 / CPU：0＆＃34;]（前缀/ conv2d /卷积，前缀/ conv2d /偏压/读）]]

我尝试了不同的配置：

从仅CPU脚本生成.pb文件，在仅CPU上运行
从GPU可见的脚本生成.pb文件，以GPU可见
从仅CPU脚本生成.pb文件，以GPU可见

所有这些都会引发同样的错误。

问题在于我要冻结的检查点具有用data_format='NCHW'定义的操作。如何使用NHWC数据格式冻结检查点？

更新

在文件中查看，我发现在graph.pbtxt中，许多操作data_format被硬编码到NCHW。我想，然后，我需要创建一个NHWC格式的新模型，有选择地从检查点加载图层的权重，并使用该图表手动保存.pb文件。 .. 我已经假设有一个流程可以做到这一点，但我找不到任何关于此的文档，也没有示例。

更新2：

尝试在OpenCV的DNN模块中导入.pb文件后，我发现了以下内容：

使用数据格式NCHW和数据格式为NHWC的graph.pbtxt将训练中的检查点冻结在一起会导致文件无法使用.pb。我还没有找到确切的原因，但是将.pb转换为.pbtxt并将其与工作冻结图进行比较，文件的差异仅在于权重和偏差常量中存储的值。
将训练中的检查点与graph.pbtxt两者的数据格式NHWC冻结在一起，生成一个工作冻结图。

然后，似乎检查点在具有不同数据格式的图形之间不可转移（即使在冻结过程中没有引发错误或警告）。

Answer 1

通常，您希望在函数中包含图形构造，以便您可以有条件地为预测案例重建图形，因为通常会有很多图形从训练变为预测。正如您发现的NCHW和NWHC版本（例如卷积层）在图形原型中实际上是不同的Ops，并且它们以这种方式进行硬编码，因为GPU优化仅适用于一个格式。

编辑图形原型非常难以正确执行，这就是为什么执行此操作的大多数TensorFlow代码都遵循上述模式。处于非常高的水平：

def build_graph(data_format='NCHW'):
   # Conditionally use proper ops based on data_format arg

training_graph = tf.Graph()
with training_graph.as_default():
   build_graph(data_format='NCHW')

with tf.Session() as sess:
   # train
   # checkpoint session

prediction_graph = tf.Graph()
with prediction_graph.as_default():
   build_graph(data_format='NHWC')
   # load checkpoint
   # freeze graph

请注意，tf.estimator.Estimator框架使这相对容易。您可以使用mode中的model_fn参数来决定数据格式，然后有两个不同的input_fn用于培训和预测，框架将完成剩下的工作。你可以在这里找到一个端到端的例子：https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10_estimator/cifar10_main.py#L77（我已经链接到相关的行）

使用不同的数据格式冻结图表

更新

更新2：

1 个答案: