我正在使用通用句子编码器进行二进制分类。经过10个时期的训练模型后,我得到的val acc为.69。在对测试数据集进行预测时,预测的类别概率都在0.75到0.79的范围内,这是因为其他类别的得分为0。
实际目标分配为({0: 2340, 1: 1557})
。以下架构有任何问题。
# Counter({0: 2340, 1: 1557}) # target distribution
def UniversalEmbedding(x):
return embed(tf.squeeze(tf.cast(x, tf.string)),
signature="default", as_dict=True)["default"]
input_text = Input(shape=(1,), dtype=tf.string)
embedding = Lambda(UniversalEmbedding, output_shape=(512,))(input_text)
dense = Dense(256, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001))(embedding)
pred = Dense(1, activation='sigmoid')(dense)
model = Model(inputs=[input_text], outputs=pred)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
monitor_metric = 'val_loss'
learning_rate = ReduceLROnPlateau(monitor=monitor_metric,cooldown=1)
early_stopping = EarlyStopping(monitor=monitor_metric, patience=3)
best_model_path = STAMP + '_trnsfr_lrn_USE_'+'.h5'
model_checkpoint = ModelCheckpoint(best_model_path,save_best_only=True,save_weights_only=True)
callbacks = [learning_rate, early_stopping, model_checkpoint]
with tf.Session() as session:
K.set_session(session)
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
history = model.fit(X_train,
y_train,
validation_split=0.2,
epochs=10,
batch_size=128,
callbacks=callbacks)
输出
_______
Layer (type) Output Shape Param #
=================================================================
input_22 (InputLayer) (None, 1) 0
_________________________________________________________________
lambda_22 (Lambda) (None, 512) 0
_________________________________________________________________
dense_39 (Dense) (None, 256) 131328
_________________________________________________________________
dense_40 (Dense) (None, 1) 257
=================================================================
Total params: 131,585
Trainable params: 131,585
Non-trainable params: 0
_________________________________________________________________
Train on 7274 samples, validate on 1819 samples
Epoch 1/10
7274/7274 [==============================] - 34s 5ms/step - loss: 0.8704 - acc: 0.5734 - val_loss: 0.7177 - val_acc: 0.6295
Epoch 2/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6791 - acc: 0.6358 - val_loss: 0.6496 - val_acc: 0.6366
Epoch 3/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6470 - acc: 0.6611 - val_loss: 0.6376 - val_acc: 0.6619
Epoch 4/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6411 - acc: 0.6760 - val_loss: 0.6343 - val_acc: 0.6866
Epoch 5/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6380 - acc: 0.6795 - val_loss: 0.6325 - val_acc: 0.6954
Epoch 6/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6354 - acc: 0.6892 - val_loss: 0.6300 - val_acc: 0.6982
Epoch 7/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6345 - acc: 0.6918 - val_loss: 0.6275 - val_acc: 0.6800
Epoch 8/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6326 - acc: 0.6930 - val_loss: 0.6254 - val_acc: 0.6773
Epoch 9/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6309 - acc: 0.6984 - val_loss: 0.6234 - val_acc: 0.7037
Epoch 10/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6288 - acc: 0.7000 - val_loss: 0.6208 - val_acc: 0.6976
根据验证数据集进行预测
new_text = np.squeeze(X_val)
with tf.Session() as session:
K.set_session(session)
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
predict_proba = model.predict(new_text, batch_size=128)
prediction = np.where(predict_proba > 0.50, 1, 0)
print(classification_report(y_val,np.ravel(prediction)))
输出
Exception ignored in: <bound method BaseSession._Callable.__del__ of <tensorflow.python.client.session.BaseSession._Callable object at 0x7f737c5ff940>>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1473, in __del__
self._session._session, self._handle)
tensorflow.python.framework.errors_impl.CancelledError: (None, None, 'Session has been closed.')
precision recall f1-score support
0 0.00 0.00 0.00 2340
1 0.40 1.00 0.57 1557
accuracy 0.40 3897
macro avg 0.20 0.50 0.29 3897
weighted avg 0.16 0.40 0.23 3897
Unique pred [1]
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)