Question

我在此上花费了大约两个小时，但找不到解决方案。我需要关闭的东西可能是boolen mask，但我仍然缺少下一步。

我的神经网络没有学习，所以我开始查看它执行的每个步骤。当然可以，我发现了一个问题。问题在于这样一个事实，由于输入层上的稀疏性，我传播了太多偏置项。我设置的唯一性是最后的time矩阵将为零矩阵。让我告诉你，我将首先显示笔记本的屏幕截图，然后显示代码。

屏幕截图：

我不希望将偏项添加到整个time是零矩阵的地方。我以为我也许可以对布尔型蒙版过滤矩阵执行操作？

代码如下：

import tensorflow as tf
import numpy as np

dim = 4
# batch x time x events x dim
tensor = np.random.rand(1, 3, 4, dim)
zeros_last_time = np.zeros((4, dim))
tensor[0][2] = zeros_last_time

input_layer = tf.placeholder(tf.float64, shape=(None, None, 4, dim))

# These are supposed to perform operations on the non-zero times
Wn = tf.Variable(
    tf.truncated_normal(dtype=dtype, shape=(dim,), mean=0, stddev=0.01),
    name="Wn")
bn = tf.Variable(tf.truncated_normal(dtype=dtype, shape=(1,), mean=0, 
stddev=0.01), name="bn")

# this is the op I want to be performed only on non-zero times
op = tf.einsum('bted,d->bte', input_layer, Wn) + bn

s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)

# first let's see what the bias term is
s.run(bn, feed_dict={input_layer: tensor})

s.run(op, feed_dict={input_layer: tensor})

编辑：所以我相信tf.where是我所需要的。

Answer 1

也许不错的解决方案是使用tf.where创建零掩码，其中输入为零（在最后一维中），否则为零。一旦获得了该蒙版，就可以将其乘以偏差以获得结果。这是我的解决方案：

import tensorflow as tf
import numpy as np

dim = 4
# batch x time x events x dim
tensor = np.random.rand(1, 3, 4, dim)
zeros_last_time = np.zeros((4, dim))
tensor[0][2] = zeros_last_time
dtype = tf.float64
input_layer = tf.placeholder(tf.float64, shape=(None, None, 4, dim))

# These are supposed to perform operations on the non-zero times
Wn = tf.Variable(
    tf.truncated_normal(dtype=dtype, shape=(dim,), mean=0, stddev=0.01),
    name="Wn")
bn = tf.Variable(
    tf.truncated_normal(dtype=dtype, shape=(1,), mean=0, stddev=0.01),
    name="bn")

bias = bn * tf.cast(
    tf.where(input_layer == tf.zeros(tf.shape(input_layer)[-1]),
             tf.zeros(tf.shape(input_layer)[-1]),
             tf.ones(tf.shape(input_layer)[-1])), dtype)

# this is the op I want to be performed only on non-zero times
op = tf.einsum('bted,d->bte', input_layer, Wn) + bias

s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)

# first let's see what the bias term is
print(s.run(bn, feed_dict={input_layer: tensor}))
print(s.run(op, feed_dict={input_layer: tensor}))

Answer 2

我设法得到正确的偏见，但随后发现尺寸被弄乱了。所以这只是部分答案：

import tensorflow as tf
import numpy as np

dim = 4
# batch x time x events x dim
tensor = np.random.rand(1, 3, 4, dim)
zeros_last_time = np.zeros((4, dim))
tensor[0][2] = zeros_last_time

dtype = tf.float64
input_layer = tf.placeholder(dtype, shape=(None, None, 4, dim))

# These are supposed to perform operations on the non-zero times
Wn = tf.Variable(
tf.truncated_normal(dtype=dtype, shape=(dim,), mean=0, stddev=0.01),
name="Wn")
bn = tf.Variable(
tf.truncated_normal(dtype=dtype, shape=(1,), mean=0, stddev=0.01),
name="bn")
zeros = tf.equal(input_layer, tf.cast(tf.zeros(tf.shape(input_layer)[2:]), 
tf.float64))

# bias
where_ = tf.where(zeros, tf.zeros(tf.shape(input_layer)), 
tf.ones(tf.shape(input_layer)))

bias = bn * tf.cast(where_, tf.float64)

op = tf.einsum('bted,d->bte', input_layer, Wn) + bias  # will fail

print(bias)
s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)
s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)
feed_dict = {input_layer: tensor}
s.run(bias, feed_dict)

和这两个偏见都可以做到：

biases = tf.slice(biases, [0, 0, 0, 0], [1, 3, 1, 4]) squeezed_biases = tf.squeeze(biases)

非零向量上的Tensorflow操作

2 个答案: