我在此上花费了大约两个小时,但找不到解决方案。我需要关闭的东西可能是boolen mask,但我仍然缺少下一步。
我的神经网络没有学习,所以我开始查看它执行的每个步骤。当然可以,我发现了一个问题。问题在于这样一个事实,由于输入层上的稀疏性,我传播了太多偏置项。我设置的唯一性是最后的time
矩阵将为零矩阵。让我告诉你,我将首先显示笔记本的屏幕截图,然后显示代码。
屏幕截图:
我不希望将偏项添加到整个time
是零矩阵的地方。我以为我也许可以对布尔型蒙版过滤矩阵执行操作?
代码如下:
import tensorflow as tf
import numpy as np
dim = 4
# batch x time x events x dim
tensor = np.random.rand(1, 3, 4, dim)
zeros_last_time = np.zeros((4, dim))
tensor[0][2] = zeros_last_time
input_layer = tf.placeholder(tf.float64, shape=(None, None, 4, dim))
# These are supposed to perform operations on the non-zero times
Wn = tf.Variable(
tf.truncated_normal(dtype=dtype, shape=(dim,), mean=0, stddev=0.01),
name="Wn")
bn = tf.Variable(tf.truncated_normal(dtype=dtype, shape=(1,), mean=0,
stddev=0.01), name="bn")
# this is the op I want to be performed only on non-zero times
op = tf.einsum('bted,d->bte', input_layer, Wn) + bn
s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)
# first let's see what the bias term is
s.run(bn, feed_dict={input_layer: tensor})
s.run(op, feed_dict={input_layer: tensor})
编辑:所以我相信tf.where
是我所需要的。
答案 0 :(得分:0)
也许不错的解决方案是使用tf.where
创建零掩码,其中输入为零(在最后一维中),否则为零。
一旦获得了该蒙版,就可以将其乘以偏差以获得结果。
这是我的解决方案:
import tensorflow as tf
import numpy as np
dim = 4
# batch x time x events x dim
tensor = np.random.rand(1, 3, 4, dim)
zeros_last_time = np.zeros((4, dim))
tensor[0][2] = zeros_last_time
dtype = tf.float64
input_layer = tf.placeholder(tf.float64, shape=(None, None, 4, dim))
# These are supposed to perform operations on the non-zero times
Wn = tf.Variable(
tf.truncated_normal(dtype=dtype, shape=(dim,), mean=0, stddev=0.01),
name="Wn")
bn = tf.Variable(
tf.truncated_normal(dtype=dtype, shape=(1,), mean=0, stddev=0.01),
name="bn")
bias = bn * tf.cast(
tf.where(input_layer == tf.zeros(tf.shape(input_layer)[-1]),
tf.zeros(tf.shape(input_layer)[-1]),
tf.ones(tf.shape(input_layer)[-1])), dtype)
# this is the op I want to be performed only on non-zero times
op = tf.einsum('bted,d->bte', input_layer, Wn) + bias
s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)
# first let's see what the bias term is
print(s.run(bn, feed_dict={input_layer: tensor}))
print(s.run(op, feed_dict={input_layer: tensor}))
答案 1 :(得分:0)
我设法得到正确的偏见,但随后发现尺寸被弄乱了。所以这只是部分答案:
import tensorflow as tf
import numpy as np
dim = 4
# batch x time x events x dim
tensor = np.random.rand(1, 3, 4, dim)
zeros_last_time = np.zeros((4, dim))
tensor[0][2] = zeros_last_time
dtype = tf.float64
input_layer = tf.placeholder(dtype, shape=(None, None, 4, dim))
# These are supposed to perform operations on the non-zero times
Wn = tf.Variable(
tf.truncated_normal(dtype=dtype, shape=(dim,), mean=0, stddev=0.01),
name="Wn")
bn = tf.Variable(
tf.truncated_normal(dtype=dtype, shape=(1,), mean=0, stddev=0.01),
name="bn")
zeros = tf.equal(input_layer, tf.cast(tf.zeros(tf.shape(input_layer)[2:]),
tf.float64))
# bias
where_ = tf.where(zeros, tf.zeros(tf.shape(input_layer)),
tf.ones(tf.shape(input_layer)))
bias = bn * tf.cast(where_, tf.float64)
op = tf.einsum('bted,d->bte', input_layer, Wn) + bias # will fail
print(bias)
s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)
s = tf.Session()
glob_vars = tf.global_variables_initializer()
s.run(glob_vars)
feed_dict = {input_layer: tensor}
s.run(bias, feed_dict)
和这两个偏见都可以做到:
biases = tf.slice(biases, [0, 0, 0, 0], [1, 3, 1, 4])
squeezed_biases = tf.squeeze(biases)