我想在keras中使用ELMo,为此,我想使用tensorflow-hub,可能类似于(https://github.com/strongio/keras-elmo/blob/master/Elmo%20Keras.ipynb或https://tfhub.dev/google/elmo/2)。
我的问题是我想使用一个预训练的模型,该模型不使用单词,但可以对蛋白质(氨基酸序列)进行训练,并且应该可以对我提供的数据进行训练。我的想法是,我只需要在tensor-hub.module中加载预先训练的蛋白质模型的权重即可实现此目的。 我有火炬模型权重,有tensorflow-hub模块。
# i can create a Elmo embedding layer for "normal text" which is pretrained and also trainable
class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
self.dimensions = 1024
self.trainable=True
super(ElmoEmbeddingLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
name="{}_module".format(self.name))
self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
super(ElmoEmbeddingLayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
as_dict=True,
signature='default',
)['default']
return result
def compute_mask(self, inputs, mask=None):
return K.not_equal(inputs, '--PAD--')
def compute_output_shape(self, input_shape):
return (input_shape[0], self.dimensions)
input_text = layers.Input(shape=(1,), dtype="string")
embedding = ElmoEmbeddingLayer()(input_text)
dense = layers.Dense(256, activation='relu')(embedding)
pred = layers.Dense(1, activation='sigmoid')(dense)
model = models.Model(inputs=[input_text], outputs=pred)
# i also have an embedder of with the weights i want, but this one is not trainable with keras
cwd = os.getcwd()
model_dir = Path('../seqvec/uniref50_v2/')
weights = model_dir / 'weights.hdf5'
options = model_dir / 'options.json'
seqvec = ElmoEmbedder(options,weights,cuda_device=-1) # cuda_device=-1 for CPU
如果您不知道如何在张量集线器中加载砝码,但又有另一种解决方法可以让我使用可训练的砝码的ElmoEmbedder,请回答。