我正在尝试根据本文https://thijsvogels.nl/kpcn/bako2017kpcn.pdf
实施Tensorflow操作以执行加权平均op将计算图像中像素的平均值,然后将其权重乘以其相邻像素的值。
由于当前的执行速度很慢,我想寻求任何建议来优化此代码。
inputs.shape()为[1,740,1300,3]
weights.shape()为[1,720,1280,441]
def weighted_average(inputs, weights):
with tf.name_scope("weighted_average", "weighted_average", [inputs, weights]) as scope:
in_shape = inputs.get_shape().as_list()
w_shape = weights.get_shape().as_list()
n_channels = in_shape[3]
xs = tf.split(inputs, n_channels, axis=3)
pad = (in_shape[1] - w_shape[1]) // 2
kernel_size = pad * 2 + 1
for index in range(n_channels):
x = xs[index]
x_stack = []
for i in range(kernel_size):
for j in range(kernel_size):
x_stack.append( x[:, i:x.shape[1] - 2 * pad + i, j:x.shape[2] - 2 * pad + j, :] )
x_stack = tf.concat(x_stack, axis=3)
x = tf.reduce_sum(tf.multiply(x_stack, weights), axis=3, keep_dims=True)
xs[index] = x
return tf.concat(xs, axis=3)
答案 0 :(得分:0)
放置tf.device('/cpu:0')
来强制在CPU中计算op,并使用Eigen lib可以使其更快。
我认为,如果它是在GPU中计算的,那么它可能与所有张量变换有关。
def weighted_averagex(inputs, weights):
with tf.name_scope("weighted_average", "weighted_average", [inputs, weights]) as scope:
with tf.device('/cpu:0'):
in_shape = inputs.get_shape().as_list()
w_shape = weights.get_shape().as_list()
n_channels = in_shape[3]
xs = tf.split(inputs, n_channels, axis=3)
pad = (in_shape[1] - w_shape[1]) // 2
kernel_size = pad * 2 + 1
for index in range(n_channels):
x = xs[index]
x_stack = []
for i in range(kernel_size):
for j in range(kernel_size):
x_stack.append( x[:, i:x.shape[1] - 2 * pad + i, j:x.shape[2] - 2 * pad + j, :] )
x_stack = tf.concat(x_stack, axis=3)
x = tf.reduce_sum(tf.multiply(x_stack, weights), axis=3, keep_dims=True)
xs[index] = x
return tf.concat(xs, axis=3)