我一直在尝试通过此处给出的示例来学习BERT多类标签:https://towardsdatascience.com/building-a-multi-label-text-classifier-using-bert-and-tensorflow-f188e0ecdc5d
我能够获得所有与Sigmoid函数(multilabel)一起使用的代码。但是,我想切换到softmax方法(多类)。我已经对多类的一些现有代码取消了注释并注释掉了多标签。但是,在进行矩阵乘法以计算create_model()函数中的损失时,出现了错误(粘贴在下面)。请让我知道我做错了什么以及应该如何处理多类。
我尝试将行更改为此:per_example_loss = tf.nn.softmax_cross_entropy_with_logits(labels = labels,logits = logits)
这有效,但我不知道这是否会改变计算结果。我想确保模型以正确的方式工作。
def create_model(bert_config,is_training,input_ids,input_mask,segment_ids, 标签,num_labels,use_one_hot_embeddings): “”“创建分类模型。”“” 模型= modelling.BertModel( config = bert_config, is_training = is_training, input_ids = input_ids, input_mask =输入掩码, token_type_ids = segment_ids, use_one_hot_embeddings = use_one_hot_embeddings)
# In the demo, we are doing a simple classification task on the entire
# segment.
#
# If you want to use the token-level output, use model.get_sequence_output()
# instead.
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
#MULTILABEL
# probabilities = tf.nn.softmax(logits, axis=-1) ### multiclass case
# probabilities = tf.nn.sigmoid(logits) #### multi-label case
#
# labels = tf.cast(labels, tf.float32)
# tf.logging.info("num_labels:{};logits:{};labels:{}".format(num_labels, logits, labels))
# per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
# loss = tf.reduce_mean(per_example_loss)
#MULTICLASS STUFF
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
错误消息
model_fn中的文件“ path / Try.py”,第571行 num_labels,use_one_hot_embeddings)
create_model中的文件“ path / Try.py”,第539行 per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs,axis = -1)
binary_op_wrapper中的第884行的“ path / venv / lib / python3.7 / site-packages / tensorflow / python / ops / math_ops.py”文件 返回func(x,y,name = name)
_mul_dispatch中的文件“ path / venv / lib / python3.7 / site-packages / tensorflow / python / ops / math_ops.py”,行1180 返回gen_math_ops.mul(x,y,name = name)
文件“ path / venv / lib / python3.7 / site-packages / tensorflow / python / ops / gen_math_ops.py”,第6490行,以mul为单位 “ Mul”,x = x,y = y,name = name)
_apply_op_helper中的文件“ path / venv / lib / python3.7 / site-packages / tensorflow / python / framework / op_def_library.py”行788 op_def = op_def)
文件“ path / venv / lib / python3.7 / site-packages / tensorflow / python / util / deprecation.py”,第507行,在new_func中 返回func(* args,** kwargs)
create_op中的文件“ path / venv / lib / python3.7 / site-packages / tensorflow / python / framework / ops.py”,第3616行 op_def = op_def)
文件“ path / venv / lib / python3.7 / site-packages / tensorflow / python / framework / ops.py”,第2027行, init control_input_ops)
文件“ path / venv / lib / python3.7 / site-packages / tensorflow / python / framework / ops.py”,行1867,在_create_c_op中 引发ValueError(str(e))
ValueError:尺寸必须相等,但对于输入形状为[32,6,6],[32,6]的“损耗/ mul”(op:“ Mul”),尺寸必须为6和32。