我正在尝试仅在特定的最后一层(比如说最后3层)微调BERT。我想使用Google Colab进行TPU培训。我正在使用hub.Module加载BERT并对其进行微调,然后将经过微调的输出用于分类任务。
bert_module = hub.Module(BERT_MODEL_HUB, tags=tags, trainable=True)
hub.Module
可以选择将模型设置为可训练或不可训练,但不能将模型设置为部分可训练(仅特定层)
有人知道我如何使用hub.Module
来训练最后1,2或3层BERT吗?
谢谢
答案 0 :(得分:1)
您可以在可训练变量列表中手动设置它。以下是我在tensorflow-keras-
中的Bert层的实现class BertLayer(tf.keras.layers.Layer):
def __init__(self, n_fine_tune_layers, **kwargs):
self.n_fine_tune_layers = n_fine_tune_layers
self.trainable = True
self.output_size = 768
super(BertLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.bert = hub.Module(
bert_path,
trainable=True,# did this in place of self.trainable
name="{}_module".format(self.name)
)
trainable_vars = self.bert.variables
trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
#print("--------------------------len=",len(trainable_vars))
# Select how many layers to fine tune
trainable_vars = trainable_vars[-self.n_fine_tune_layers:]
# Add to trainable weights
for var in trainable_vars:
self._trainable_weights.append(var)
for var in self.bert.variables:
if var not in self._trainable_weights:
self._non_trainable_weights.append(var)
super(BertLayer, self).build(input_shape)
def call(self, inputs):
inputs = [K.cast(x, dtype="int32") for x in inputs]
input_ids, input_mask, segment_ids = inputs
bert_inputs = dict(
input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids
)
result = self.bert(inputs=bert_inputs, signature="tokens", as_dict=True)[
"pooled_output"
]
return result
def compute_output_shape(self, input_shape):
return (input_shape[0], self.output_size)
关注以上代码中的下一行-
trainable_vars = trainable_vars[-self.n_fine_tune_layers:]
通过defalt-可以将n_fine_tune_layers参数设置为1/2/3-
def __init__(self, n_fine_tune_layers=2, **kwargs):
答案 1 :(得分:1)
修改blog post中的代码,我们可以选择正确的图层。这也可以通过链接到博客文章的回购来解决,尽管效果不佳。
def build(self, input_shape):
self.bert = hub.Module(
bert_path,
trainable=self.trainable,
name="{}_module".format(self.name)
)
trainable_vars = self.bert.variables
# Remove unused layers
trainable_vars = [var for var in trainable_vars if not "/cls/" in var.name]
# ===========Replace incorrect line with:====================
# Select how many layers to fine tune. note: this is wrong in the original code
import re
def layer_number(var):
'''Get which layer a variable is in'''
m = re.search(r'/layer_(\d+)/', var.name)
if m:
return int(m.group(1))
else:
return None
layer_numbers = list(map(layer_number, trainable_vars))
n_layers = max(n for n in layer_numbers if n is not None) + 1 # layers are zero-indexed
trainable_vars = [var for n, var in zip(layer_numbers, trainable_vars)
if n is not None and n >= n_layers - self.n_fine_tune_layers]
# ========== Until here ====================
# Add to trainable weights
self._trainable_weights.extend(trainable_vars)
# Add non-trainable weights
for var in self.bert.variables:
if var not in self._trainable_weights:
self._non_trainable_weights.append(var)
super(BertLayer, self).build(input_shape)
答案 2 :(得分:0)
以下代码仅摘自本文(https://towardsdatascience.com/bert-in-keras-with-tensorflow-hub-76bcbc9417b),并不正确。
trainable_vars = self.bert.variables
trainable_vars = trainable_vars[-self.n_fine_tune_layers:]
将按字母顺序而不是实际图层顺序返回变量。 这样,它将在第4层等之前返回第11层。这不是您想要的。
我还没有确切地知道如何获得实施图层的实际顺序,但是当我这样做时会更新此答案!