Question

我试图通过一些改动来运行这个ResNet。https://github.com/tensorflow/models/tree/master/official/resnet

查找错误后，我理解问题是：

张量属于不同的图形，但我无法弄清楚它是如何产生的，因为我自己并没有创建任何图形。
我在解析器函数替换中有未初始化的变量。

如果是初始化 - 在使用自动初始化和创建会话的Estimator时应该如何初始化它们？

这是错误：

ValueError: Tensor("IsVariableInitialized:0", shape=(), dtype=bool) must be from the same graph as Tensor("report_uninitialized_variables/IsVariableInitialized:0", shape=(), dtype=bool).

整个代码非常庞大，所以我只提供我所做的更改（因为它在没有这些更改的情况下运行）。其余代码未被触及（上面的链接中的repo）

这是原始的解析器函数（从二进制文件读取）：

def parse_record(raw_record, is_training):
  """Parse CIFAR-10 image and label from a raw record."""
  # Convert bytes to a vector of uint8 that is record_bytes long.
  record_vector = tf.decode_raw(raw_record, tf.uint8)

  # The first byte represents the label, which we convert from uint8 to int32
  # and then to one-hot.
  label = tf.cast(record_vector[0], tf.int32)
  label = tf.one_hot(label, _NUM_CLASSES)

  # The remaining bytes after the label represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(record_vector[1:_RECORD_BYTES],
                       [_NUM_CHANNELS, _HEIGHT, _WIDTH])

  # Convert from [depth, height, width] to [height, width, depth], and cast as
  # float32.
  image = tf.cast(tf.transpose(depth_major, [1, 2, 0]), tf.float32)

  image = preprocess_image(image, is_training)

  return image, label

这是我从TFRecords读取的替代品：

def parse_record(raw_record, is_training):
  mode = 'train' if is_training else 'val'
  feature = {mode + '/image': tf.FixedLenFeature([], tf.string),
           mode + '/label': tf.FixedLenFeature([], tf.int64)}
  filename_queue = tf.train.string_input_producer([raw_record], num_epochs=1)
  reader = tf.TFRecordReader()
  _, serialized_example = reader.read(filename_queue)
  features = tf.parse_single_example(serialized_example, features=feature)
  label = tf.cast(features['train/label'], tf.int32)
  label = tf.one_hot(label, _NUM_CLASSES)
  image = tf.decode_raw(features['train/image'], tf.float32)
  image = tf.reshape(image, [_HEIGHT, _WIDTH, _NUM_CHANNELS])
  image = preprocess_image(image, is_training)
  return image, label

这是Estimator的创建地点（我没有修改过这个位）

def resnet_main(flags, model_function, input_function):
  """Shared main loop for ResNet Models."""

  # Using the Winograd non-fused algorithms provides a small performance boost.
  os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'

  if flags.multi_gpu:
    validate_batch_size_for_multi_gpu(flags.batch_size)

    # There are two steps required if using multi-GPU: (1) wrap the model_fn,
    # and (2) wrap the optimizer. The first happens here, and (2) happens
    # in the model_fn itself when the optimizer is defined.
    model_function = tf.contrib.estimator.replicate_model_fn(
        model_function,
        loss_reduction=tf.losses.Reduction.MEAN)

  # Create session config based on values of inter_op_parallelism_threads and
  # intra_op_parallelism_threads. Note that we default to having
  # allow_soft_placement = True, which is required for multi-GPU and not
  # harmful for other modes.
  session_config = tf.ConfigProto(
      inter_op_parallelism_threads=flags.inter_op_parallelism_threads,
      intra_op_parallelism_threads=flags.intra_op_parallelism_threads,
      allow_soft_placement=True)

  # Set up a RunConfig to save checkpoint and set session config.
  run_config = tf.estimator.RunConfig().replace(save_checkpoints_secs=1e9,
                                                session_config=session_config)
  classifier = tf.estimator.Estimator(
      model_fn=model_function, model_dir=flags.model_dir, config=run_config,
      params={
          'resnet_size': flags.resnet_size,
          'data_format': flags.data_format,
          'batch_size': flags.batch_size,
          'multi_gpu': flags.multi_gpu,
          'version': flags.version,
      })

  for _ in range(flags.train_epochs // flags.epochs_between_evals):
    train_hooks = hooks_helper.get_train_hooks(
        flags.hooks,
        batch_size=flags.batch_size,
        benchmark_log_dir=flags.benchmark_log_dir)

    print('Starting a training cycle.')

    def input_fn_train():
      return input_function(True, flags.data_dir, flags.batch_size,
                            flags.epochs_between_evals,
                            flags.num_parallel_calls, flags.multi_gpu)

    classifier.train(input_fn=input_fn_train, hooks=train_hooks,
                     max_steps=flags.max_train_steps)

    print('Starting to evaluate.')
    # Evaluate the model and print results
    def input_fn_eval():
      return input_function(False, flags.data_dir, flags.batch_size,
                            1, flags.num_parallel_calls, flags.multi_gpu)

    # flags.max_train_steps is generally associated with testing and profiling.
    # As a result it is frequently called with synthetic data, which will
    # iterate forever. Passing steps=flags.max_train_steps allows the eval
    # (which is generally unimportant in those circumstances) to terminate.
    # Note that eval will run for max_train_steps each loop, regardless of the
    # global_step count.
    eval_results = classifier.evaluate(input_fn=input_fn_eval,
                                       steps=flags.max_train_steps)
    print(eval_results)

    if flags.benchmark_log_dir is not None:
      benchmark_logger = logger.BenchmarkLogger(flags.benchmark_log_dir)
      benchmark_logger.log_estimator_evaluation_result(eval_results)

Answer 1

蛮力解决。我不知道自己在做什么，但决定发布适用于我的解决方案，即使我无法解释，因为这可能有助于另一位冒险家。

删除parse_record函数中的以下行：

filename_queue = tf.train.string_input_producer([raw_record], num_epochs=1)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)

然后将serialized_example替换为raw_record（tensor，dtype = string，value = path / to / tfrecordfile）作为parse_single_example函数的源。

features = tf.parse_single_example(raw_record, features=feature)

ValueError：Tensor A必须与Tensor B在同一图表中

1 个答案: