我想制作一个自定义图层,用于将密集图层的输出与Convolution2D图层融合。
这个想法来自this paper,这是网络:
融合层试图将Convolution2D张量(256x28x28
)与密集张量(256
)融合。这是它的等式:
y_global => Dense layer output with shape 256
y_mid => Convolution2D layer output with shape 256x28x28
以下是有关Fusion流程的论文说明:
我最终制作了一个新的自定义图层,如下所示:
class FusionLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(FusionLayer, self).__init__(**kwargs)
def build(self, input_shape):
input_dim = input_shape[1][1]
initial_weight_value = np.random.random((input_dim, self.output_dim))
self.W = K.variable(initial_weight_value)
self.b = K.zeros((input_dim,))
self.trainable_weights = [self.W, self.b]
def call(self, inputs, mask=None):
y_global = inputs[0]
y_mid = inputs[1]
# the code below should be modified
output = K.dot(K.concatenate([y_global, y_mid]), self.W)
output += self.b
return self.activation(output)
def get_output_shape_for(self, input_shape):
assert input_shape and len(input_shape) == 2
return (input_shape[0], self.output_dim)
我认为我的__init__
和build
方法正确,但我不知道如何将y_global
(256个像素)与y-mid
(256x28x28维度)连接起来)在call
层中,以便输出与上面提到的等式相同。
如何在call
方法中实现此等式?
非常感谢...
更新:成功整合这两层数据的任何其他方式对我来说也是可以接受的......它并非完全必须是文中提到的方式,但它至少需要返回一个可接受的方式输出...
答案 0 :(得分:3)
我当时正在研究“图像着色”项目,最终遇到了融合层问题,然后我找到了一个包含融合层的模型。 希望可以在某种程度上解决您的问题。
embed_input = Input(shape=(1000,))
encoder_input = Input(shape=(256, 256, 1,))
#Encoder
encoder_output = Conv2D(64, (3,3), activation='relu', padding='same', strides=2,
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_input)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_output)
encoder_output = Conv2D(128, (3,3), activation='relu', padding='same', strides=2,
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same', strides=2,
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_output)
encoder_output = Conv2D(512, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_output)
encoder_output = Conv2D(256, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(encoder_output)
#Fusion
fusion_output = RepeatVector(32 * 32)(embed_input)
fusion_output = Reshape(([32, 32, 1000]))(fusion_output)
fusion_output = concatenate([encoder_output, fusion_output], axis=3)
fusion_output = Conv2D(256, (1, 1), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(fusion_output)
#Decoder
decoder_output = Conv2D(128, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(fusion_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(64, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
decoder_output = Conv2D(32, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(decoder_output)
decoder_output = Conv2D(16, (3,3), activation='relu', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(decoder_output)
decoder_output = Conv2D(2, (3, 3), activation='tanh', padding='same',
bias_initializer=TruncatedNormal(mean=0.0, stddev=0.05))(decoder_output)
decoder_output = UpSampling2D((2, 2))(decoder_output)
model = Model(inputs=[encoder_input, embed_input], outputs=decoder_output)
答案 1 :(得分:1)