TensorFlow程序有时会起作用,有时会使用相同的随机种子在培训的不同点抛出与重塑节点相关的不同错误

时间:2018-03-17 16:59:12

标签: tensorflow

我遇到了一个特别怪异的TensorFlow问题。 (TensorFlow 1.4.1和Python 2.7)

错误

运行相同的程序时,我发现了几个不同的错误。以下是一个例子:

W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.377745: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.378256: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.378753: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.379193: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.379692: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.380208: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.380709: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.381166: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.381654: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.382138: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.382601: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.383111: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.383601: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.384107: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.384551: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018-03-15 18:52:25.384792: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]
2018/03/15 18:52:25 ERROR|--|Traceback (most recent call last):
  File "experiment_runner.py", line 140, in experimentset
    results = e.run()
  File "experiment_runner.py", line 69, in run
    results = run_fn()
  File "experiment_runner.py", line 100, in traintest
    return tt.run(self.exp_specs, self.data, model)
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/experiment/train_test.py", line 149, in run
    return self._fit_model(exp_specs, data, model)
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/experiment/train_test.py", line 56, in _fit_model
    self.train_proteins_epoch(data["train"], model, exp_specs["args"]["minibatch_size"])
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/experiment/train_test.py", line 186, in train_proteins_epoch
    model.train(minibatch)
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/tf_model.py", line 169, in train
    results = self._train(data, options=run_options, run_metadata=run_metadata, **kwargs)
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/tf_model.py", line 113, in _train
    results = self.run_graph([self.train_op, self.loss], data, "train", **kwargs)
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/protnet.py", line 135, in run_graph
    return self.sess.run(outputs, feed_dict=feed_dict, options=options, run_metadata=run_metadata)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
InvalidArgumentError: Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]

Caused by op u'optimizer/gradients/energy_2/map/while/Gather_grad/Reshape', defined at:
  File "experiment_runner.py", line 332, in <module>
    main()
  File "experiment_runner.py", line 328, in main
    e.run()
  File "experiment_runner.py", line 69, in run
    results = run_fn()
  File "experiment_runner.py", line 140, in experimentset
    results = e.run()
  File "experiment_runner.py", line 69, in run
    results = run_fn()
  File "experiment_runner.py", line 99, in traintest
    model = tt.build_model(self.exp_specs, self.data)
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/experiment/train_test.py", line 141, in build_model
    model = eval(hparams["name"] + "(exp_specs, data['train'])")
  File "<string>", line 1, in <module>
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/protnet.py", line 110, in __init__
    self.setup_loss()
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/tf_model.py", line 90, in setup_loss
    self.train_op = self.hparams["optimizer"](self.loss, **self.hparams["optimizer_args"])
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/optimizers.py", line 9, in tf_sgd
    return tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 343, in minimize
    grad_loss=grad_loss)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 414, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 373, in _GatherGrad
    values = array_ops.reshape(grad, values_shape)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3938, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op u'energy_2/map/while/Gather', defined at:
  File "experiment_runner.py", line 332, in <module>
    main()
[elided 6 identical lines from previous traceback]
  File "<string>", line 1, in <module>
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/protnet.py", line 77, in __init__
    dtype=tf.float32, parallel_iterations=32)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 389, in map_fn
    swap_memory=swap_memory)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
    result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 379, in compute
    packed_fn_values = fn(packed_values)
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/protnet.py", line 75, in <lambda>
    ], None, in_dims=nv, in_dists=self.in_dists, **args)[0],
  File "/s/chopin/a/grad/jonbyrd/protqa/protqa/modeling/models/nn_components.py", line 447, in energy
    return tf.reshape(tf.reduce_mean(tf.einsum('abi,abj->abij', (tf.expand_dims(verts, axis=1) * tf.gather(verts, hood_indices)), dists), axis=[0,1]), [in_dims*in_dists]), None
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 2486, in gather
    params, indices, validate_indices=validate_indices, name=name)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1834, in gather
    validate_indices=validate_indices, name=name)
  File "/s/jawar/j/nobackup/protein_learning/virtualenv/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 122496 values, but the requested shape has 0
     [[Node: optimizer/gradients/energy_2/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_2/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_2/map/while/Gather_grad/concat)]]

但是,我遇到了几个不同的错误。对于我的图表中的此节点:

[[Node: optimizer/gradients/energy_1/map/while/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_1/map/while/mul_grad/tuple/control_dependency_1, optimizer/gradients/energy_1/map/while/Gather_grad/concat)]]

以下是我见过的一些错误:

Size 1 must be non-negative, not -1231271574
Size 1 must be non-negative, not -1225669337
Input to reshape is a tensor with 122496 values, but the requested shape has 0
Input to reshape is a tensor with 122496 values, but the requested shape has 1715491170492
Input to reshape is a tensor with 122496 values, but the requested shape has 1693172050944
Input to reshape is a tensor with 122496 values, but the requested shape has 1706639062128

对于我的图表中的此节点:

[[Node: optimizer/gradients/energy_1/map/while/Mean_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/energy_1/map/while/TensorArrayWrite/TensorArrayWriteV3_grad/tuple/control_dependency, optimizer/gradients/energy_1/map/while/Mean_grad/DynamicStitch/_203)]]

我见过这些错误,例如:

Size 0 must be non-negative, not -1237175937
Input to reshape is a tensor with 512 values, but the requested shape has 0

我还有一个“Nan in summary histogram”错误,但我会假设这是由于模型分歧造成的。

我不明白为什么我会在训练过程的中途在运行时收到这些与形状相关的错误。我也不明白为什么这些错误中的值会改变每次运行。

情境

当使用相同的numpy和tensorflow随机种子在相同数据上运行具有相同超参数的相同程序时,程序有时会毫无问题地运行,但通常会在训练过程中的不同点处抛出其中一个错误。有时这发生在第一个时代,有时是在许多训练时代之后(甚至在40多个时代之后,在训练结束前不久)。

奇怪的是,这似乎非常依赖于在抛出错误的层之前层中的潜在特征/卷积滤波器的数量。较少数量的过滤器(如16,32,64和128)几乎总会得到与我提到的第一个计算图节点相关的错误,而512个过滤器主要会得到与第二个节点相关的错误。这些超参数数字失败了7-10 / 10次。

然而,运行具有该数量的过滤器为1或1024的程序是成功的10/10运行,这让我感到困惑。

程序

该计划是深入学习蛋白质结构的研究框架的一部分。给我错误的部分是图形卷积/消息传递网络的一部分,该网络将可变大小/形状图形下采样为单个潜在表示。上一节中的过滤器数量对应于图表中每个节点的潜在要素数量。

这是下采样方法:

def energy(input, _, in_dims, in_dists, **kwargs):
    '''Params:
        input: a tuple representing a single graph containing:
            a 2d tensor of vertex representations(vertices x features)
            a 3d tensor of distance metrics between nodes (vertices x neighbors x distances)
            a 2d tensor containing indices of the neighbors of each vertex in the first tensor(vertices x neighbor indices)
        in_dims: number of incoming features for each vertex
        in_dists: number of distance metrics

    Returns: a 1d tensor of size [in_dims*in_dists] which is the sum over all pairs of neighboring vertices of
    (the outer product of (the elementwise product of the two vertices) and the distances).
    '''
    verts, dists, hood_indices = input
    return tf.reshape(tf.reduce_mean(tf.einsum('abi,abj->abij', (tf.expand_dims(verts, axis=1) * tf.gather(verts, hood_indices)), dists), axis=[0,1]), [in_dims*in_dists]), None

这是调用该方法的map_fn,其中layer_fn是上面的方法:

              # downsample each graph using layer_fn
                input = tf.map_fn(
                    lambda ind, data=input[0], merge_fn=layer_fn, nv=input[0].get_shape().as_list()[-1], args=args: merge_fn(
                        [tf.slice(data, [tf.squeeze(tf.slice(ind, [0], [1])), 0],
                                 [tf.squeeze(tf.slice(ind, [1], [1])), nv], name="merge_vertex_slice"),
                         tf.slice(self.distances, [tf.squeeze(tf.slice(ind, [0], [1])), 0, 0],
                                  [tf.squeeze(tf.slice(ind, [1], [1])), self.in_nhood_size, self.in_dists], name="merge_distance_slice"),
                         tf.slice(tf.squeeze(self.in_hood_indices), [tf.squeeze(tf.slice(ind, [0], [1])), 0],
                                  [tf.squeeze(tf.slice(ind, [1], [1])), self.in_nhood_size], name="merge_index_slice"),
                         ], None, in_dims=nv, in_dists=self.in_dists, **args)[0],
                    tf.stack([tf.cumsum(self.graph_orders, exclusive=True), self.graph_orders], axis=-1),
                    dtype=tf.float32, parallel_iterations=32)

Stack Overflow不允许我发布构建计算图的类,因为它会使我的帖子超过字符限制。

使用top_k方法对图表进行下采样时,程序运行时没有错误。

混乱

我不明白为什么在成功的训练时期之后我会得到这些重塑错误,或者为什么过滤器的数量会以这种方式影响问题。我也不明白为什么每次重塑错误都会得到不同的值。张量维度应该都是固定的,除了小批量中的例子数量(我通过map_fn处理)和每个示例图中的顶点数量。

我很难解决这个问题,并且非常感谢外界的投入。谢谢!

2 个答案:

答案 0 :(得分:0)

我的问题最终是我将索引传递给tf.gather(),这些索引大于我试图从中收集的张量的大小(能量函数中的hood_indices)。我不确定这是如何导致我看到的错误,但它解决了我的问题。

答案 1 :(得分:0)

我有一个类似的错误“tensorflow.python.framework.errors_impl.InvalidArgumentError: Size 0 must be non-negative, not -1610612736 [Op:Reshape]”

原来与tf.repeat有关。基本上,下面的简单代码会给你一个 OOM 错误:

a = tf.range(5000000)
b = tf.concat([tf.zeros(5000000-1, dtype=tf.int32),tf.constant([5000000], dtype=tf.int32)], axis=0)
c = tf.repeat(a,b)

解决方案是调整 tf.repeat 的参数,例如使 input 变小(丢弃无用的值),从 repeats

中删除零

欲知更多详情:https://github.com/tensorflow/tensorflow/issues/46648#issuecomment-876168035