Tensorflow:两个网络中使用的自定义操作同时产生nan

时间:2016-11-22 07:10:38

标签: python neural-network tensorflow

我已经用渐变编写了以下自定义操作来对真实矢量进行二值化。 (此代码的灵感来自https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342

def py_func(func, inp, Tout, stateful=True, name=None, grad=None):

    # Need to generate a unique name to avoid duplicates:
    rnd_name = name+'PyFuncGrad' + str(np.random.randint(0, 1E+8))

    tf.RegisterGradient(rnd_name)(grad)  # see _MyBinarizerGrad for grad example
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

def mycustombinarizer(x):
    if _test_:
        return x>0.5
    sess_ = tf.Session()
    probs = tf.constant(x)
    probs = tf.reshape(probs,[-1])
    probs = tf.pack([1-probs, probs], axis=1)
    probs = tf.log(probs/(1-probs))
    indexes = tf.multinomial(probs, 1)
    indexes = tf.cast(tf.reshape(indexes, list(x.shape)),tf.float32)
    with sess_.as_default():
        binary_x = indexes.eval()
    return binary_x

def binarizer(x, name=None):
    with ops.name_scope(name, "Binarizer", [x]) as name:
        sqr_x = py_func(mycustombinarizer,
                        [x],
                        [tf.float32],
                        name=name,
                        grad=_MyBinarizerGrad)  # <-- here's the call to the gradient
        return tf.reshape(sqr_x[0], tf.shape(x))

def _MyBinarizerGrad(op, grad):
    return grad

如果只有一个网络使用此操作,则此功能完全正常。但是,如果我创建同一网络的两个副本并使用此二进制化器操作并尝试优化组合成本(cost_net1 + cost_net2),那么它会在几次迭代后产生纳米成本。

def network_(x, netname):
    with tf.variable_scope(netname):
        x = someoperation(x)
        ...
        ret_tensor = binarizer(x,netname)

ypred1 = network_(input,'net1')
ypred2 = network_(input,'net2')
cost = costfn(ypred1,ytrue)+costfn(ypred2,ytrue)

有谁能告诉我自定义功能的实现有什么问题?会话的问题是在mycustombinarizer中评估indices.eval()还是与name_scope / variable_scope有问题,还是完全有问题。我被困在这里。

2 个答案:

答案 0 :(得分:3)

尝试这个。

@function.Defun()
def BinarizerGrad(unused_x, dy):
  # Backprop dy directly.
  return dy

@function.Defun(grad_func=BinarizerGrad)
def Binarizer(x):
  # your whatever forward function here. 
  return tf.floor(x + tf.random_uniform(tf.shape(x)))

g = tf.Graph()
with g.as_default():
  x = tf.placeholder(tf.float32)
  y = Binarizer(x)
  dy = tf.placeholder(tf.float32)
  dx = tf.gradients(y, x, grad_ys=dy)

with tf.Session(graph=g) as sess:
  x_val = np.array([[1, 2, 3], [0.5, 0.3, 0.2]])
  dy_val = np.array([[1, 0, 1], [0., 0.1, 0.9]])
  for v in sess.run([x, y, dx], feed_dict={x : x_val, dy: dy_val}):
    print v

答案 1 :(得分:0)

我不认为构建图形,启动会话,并且在py_func中很好地支持运行。在这种情况下,你可以删除所有这些东西,只需使用直接张量流代码,一切都应该有效。