我使用BERT
进行二进制分类,批大小为8,但是当我计算损失值时,总是出现以下错误:
回溯(最近通话最近):文件 “ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 1356行,_do_call中 返回fn(* args)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, _run_fn中的第1341行 选项,feed_dict,fetch_list,target_list,run_metadata)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 第1429行,在_call_tf_sessionrun中 run_metadata)tensorflow.python.framework.errors_impl.InvalidArgumentError: 不兼容的形状:[3]与[8] [[{{node 渐变/ sub_grad / BroadcastGradientArgs}}]
在处理上述异常期间,发生了另一个异常:
回溯(最近通话最近):文件 “ E:/project_chris/aad_bert_version/run.py”,第81行,在 input_y:y_train})文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 950行,运行中 run_metadata_ptr)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 第1173行,_run feed_dict_tensor,选项,run_metadata)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 第1350行,在_do_run run_metadata)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ client \ session.py”, 1370行,_do_call 提高type(e)(node_def,op,message)tensorflow.python.framework.errors_impl.InvalidArgumentError: 不兼容的形状:[3]与[8] [[节点 渐变/ sub_grad / BroadcastGradientArgs(定义为 E:/project_chris/aad_bert_version/run.py:57)]]
“ gradients / sub_grad / BroadcastGradientArgs”的原始堆栈跟踪:
在第57行的文件“ E:/project_chris/aad_bert_version/run.py”中 train_op = tf.train.AdamOptimizer(lr).minimize(loss)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ training \ optimizer.py” , 第403行,最小化 grad_loss = grad_loss)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ training \ optimizer.py”, 第512行,在compute_gradients中 colocate_gradients_with_ops = colocate_gradients_with_ops)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ gradients_impl.py”, 158行,渐变 unconnected_gradients)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ gradients_util.py”, _GradientsHelper中的第731行 lambda:grad_fn(op,* out_grads))文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ gradients_util.py”, _MaybeCompile中的第403行 return grad_fn()#退出早期文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ gradients_util.py”, 731行,在 lambda:grad_fn(op,* out_grads))文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ math_grad.py”, _SubGrad中的第1027行 rx,ry = gen_array_ops.broadcast_gradient_args(sx,sy)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ gen_array_ops.py”, 第1004行,在broadcast_gradient_args中 “ BroadcastGradientArgs”,s0 = s0,s1 = s1,名称=名称)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ framework \ op_def_library.py ”, _apply_op_helper中的第788行 op_def = op_def)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ util \ deprecation.py”, 第507行,在new_func中 返回func(* args,** kwargs)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ framework \ ops.py”, 第3616行,位于create_op中 op_def = op_def)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ framework \ ops.py”, 第2005行, init self._traceback = tf_stack.extract_stack()...最初创建为op'sub',定义在:File “ E:/project_chris/aad_bert_version/run.py”,第56行 损失= tf.reduce_mean(tf.square(tf.reshape(pred,[-1])-tf.reshape(input_y,[-1])))文件 “ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ math_ops.py”, 第884行,在binary_op_wrapper中 返回func(x,y,name = name)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ ops \ gen_math_ops.py”, 子行中的11574行 “ Sub”,x = x,y = y,名称=名称)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ framework \ op_def_library.py ”, _apply_op_helper中的第788行 op_def = op_def)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ util \ deprecation.py”, 第507行,在new_func中 返回func(* args,** kwargs)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ framework \ ops.py”, 第3616行,位于create_op中 op_def = op_def)文件“ C:\ Users \ Meiwei \ AppData \ Local \ Programs \ Python \ Python37 \ lib \ site-packages \ tensorflow \ python \ framework \ ops.py”, 第2005行, init self._traceback = tf_stack.extract_stack()
lr = 0.0006 # 学习率
# 配置文件
data_root = './bert_model_chinese'
bert_config_file = os.path.join(data_root, 'bert_config.json')
bert_config = modeling.BertConfig.from_json_file(bert_config_file)
init_checkpoint = os.path.join(data_root, 'bert_model.ckpt')
bert_vocab_file = os.path.join(data_root, 'vocab.txt')
token = tokenization.CharTokenizer(vocab_file=bert_vocab_file)
input_ids = tf.placeholder(tf.int32, shape=[None, None], name='input_ids')
input_mask = tf.placeholder(tf.int32, shape=[None, None], name='input_masks')
segment_ids = tf.placeholder(tf.int32, shape=[None, None], name='segment_ids')
input_y = tf.placeholder(tf.float32, shape=[None, 1], name="input_y")
weights = {
'out': tf.Variable(tf.random_normal([768, 1]))
}
biases = {
'out': tf.Variable(tf.constant(0.1, shape=[1, ]))
}
model = modeling.BertModel(
config=bert_config,
is_training=False,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=False)
tvars = tf.trainable_variables()
(assignment, initialized_variable_names) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)
tf.train.init_from_checkpoint(init_checkpoint, assignment)
output_layer_pooled = model.get_pooled_output() # 这个获取句子的output
output_layer_pooled = tf.nn.dropout(output_layer_pooled, keep_prob=0.9)
w_out = weights['out']
b_out = biases['out']
pred = tf.add(tf.matmul(output_layer_pooled, w_out), b_out, name="pre1")
pred = tf.reshape(pred, shape=[-1, 1], name="pre")
loss = tf.reduce_mean(tf.square(tf.reshape(pred, [-1]) - tf.reshape(input_y, [-1])))
train_op = tf.train.AdamOptimizer(lr).minimize(loss)
EPOCHS = 5
max_sentence_length = 512
batch_size = 8
data_path = './data'
train_input,predict_input =fffffuck(data_path,bert_vocab_file,True,True,
'./temp',max_sentence_length,batch_size,batch_size,batch_size)
data_loader = TextLoader(train_input,batch_size)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(EPOCHS):
data_loader.shuff()
for j in range(data_loader.num_batches):
x_train, y_train = data_loader.next_batch(j)
print(y_train)
print(y_train.shape)
x_input_ids = x_train[0]
x_input_mask = x_train[1]
x_segment_ids = x_train[2]
loss_, _ = sess.run([loss, train_op],
feed_dict={input_ids: x_input_ids, input_mask: x_input_mask, segment_ids: x_segment_ids,
input_y: y_train})
print('loss:', loss_)
class TextLoader(object):
def __init__(self, dataSet,batch_size):
self.data = dataSet
self.batch_size = batch_size
self.shuff()
def shuff(self):
self.num_batches = int(len(self.data) // self.batch_size)
if self.num_batches == 0:
assert False, 'Not enough data, make batch_size small.'
np.random.shuffle(self.data)
def next_batch(self,k):
x = []
y = []
for i in range(self.batch_size):
tmp = list(self.data)[k*self.batch_size + i][:3]
x.append(tmp)
y_ = list(self.data)[k*self.batch_size + i][3]
y.append(y_)
x = np.array(x)
return x,np.array(y).reshape([self.batch_size,1])