其背后的想法是我想尝试使用Bert Model进行一些老派的梯度上升样式可视化。
我想知道输入对特定图层特定尺寸的影响。因此,我将第一个词嵌入层的输出作为特定层特定尺寸的输出的梯度。
我在这里能做的最好的事情是:
from transformers import BertTokenizer, BertModel
model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True,output_hidden_states=True)
tokenizer = BertTokenizer.from_pretrained(model_version, do_lower_case=True)
s = 'I want to sleep'
inputs = tokenizer.encode_plus(s,return_tensors='pt', add_special_tokens=False,is_pretokenized=True)
input_ids = inputs['input_ids']
output = model(input_ids)
hidden_states = output[-2]
X = hidden_states[0] #embedding space, shape: [1,4,768] (batch_size,sentence_length,embedding dimension)
y = hidden_states[3][0][0][0] ##the 0th position and 0th dimension of output of 3rd hidden layer. Dimension should just be [1], a scalar.
torch.autograd.grad(y,X,retain_graph=True, create_graph=True) #I take the gradient of y wrt. Since y is scalar. The dimension of the gradient is just the dimension of X.
但是,这还不够好。我希望渐变为实际的单词嵌入层。但是,Transformer的嵌入包含“ position_embedding”和“ token_type_embedding”。这是第一层嵌入的代码:
class BertEmbeddings(nn.Module):
"""Construct the embeddings from word, position and token_type embeddings.
"""
def __init__(self, config):
super(BertEmbeddings, self).__init__()
self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=0)
self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
# self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
# any TensorFlow checkpoint file
self.LayerNorm = BertLayerNorm(config.hidden_size, eps=config.layer_norm_eps)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
def forward(self, input_ids, token_type_ids=None, position_ids=None):
seq_length = input_ids.size(1)
if position_ids is None:
position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device)
position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
if token_type_ids is None:
token_type_ids = torch.zeros_like(input_ids)
words_embeddings = self.word_embeddings(input_ids)
position_embeddings = self.position_embeddings(position_ids)
token_type_embeddings = self.token_type_embeddings(token_type_ids)
embeddings = words_embeddings + position_embeddings + token_type_embeddings
embeddings = self.LayerNorm(embeddings)
embeddings = self.dropout(embeddings)
return embeddings
理想情况下,我希望渐变仅保留“ words_embeddings”,而不是仅保留“ words_embeddings + position_embeddings + token_type_embeddings”,并紧跟layerNorm和dropout。
我认为我可以通过修改模型来做到这一点。有没有办法改变模型吗?