二进制分类通用句子编码器Keras

时间:2019-06-24 10:30:37

标签: python tensorflow keras deep-learning classification

我正在使用通用句子编码器进行二进制分类。经过10个时期的训练模型后,我得到的val acc为.69。在对测试数据集进行预测时,预测的类别概率都在0.75到0.79的范围内,这是因为其他类别的得分为0。

实际目标分配为({0: 2340, 1: 1557})。以下架构有任何问题。

# Counter({0: 2340, 1: 1557}) # target distribution 

def UniversalEmbedding(x):
    return embed(tf.squeeze(tf.cast(x, tf.string)), 
        signature="default", as_dict=True)["default"]

input_text = Input(shape=(1,), dtype=tf.string)
embedding = Lambda(UniversalEmbedding,  output_shape=(512,))(input_text)
dense = Dense(256, activation='sigmoid',kernel_regularizer=regularizers.l2(0.001))(embedding)
pred = Dense(1, activation='sigmoid')(dense)
model = Model(inputs=[input_text], outputs=pred)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()



monitor_metric = 'val_loss'
learning_rate = ReduceLROnPlateau(monitor=monitor_metric,cooldown=1)
early_stopping = EarlyStopping(monitor=monitor_metric, patience=3)
best_model_path = STAMP + '_trnsfr_lrn_USE_'+'.h5'
model_checkpoint = ModelCheckpoint(best_model_path,save_best_only=True,save_weights_only=True)
callbacks = [learning_rate, early_stopping, model_checkpoint]


with tf.Session() as session:
  K.set_session(session)
  session.run(tf.global_variables_initializer())
  session.run(tf.tables_initializer())
  history = model.fit(X_train, 
            y_train,
            validation_split=0.2,
            epochs=10,
            batch_size=128,
            callbacks=callbacks)

输出

_______
Layer (type)                 Output Shape              Param #   
=================================================================
input_22 (InputLayer)        (None, 1)                 0         
_________________________________________________________________
lambda_22 (Lambda)           (None, 512)               0         
_________________________________________________________________
dense_39 (Dense)             (None, 256)               131328    
_________________________________________________________________
dense_40 (Dense)             (None, 1)                 257       
=================================================================
Total params: 131,585
Trainable params: 131,585
Non-trainable params: 0
_________________________________________________________________
Train on 7274 samples, validate on 1819 samples
Epoch 1/10
7274/7274 [==============================] - 34s 5ms/step - loss: 0.8704 - acc: 0.5734 - val_loss: 0.7177 - val_acc: 0.6295
Epoch 2/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6791 - acc: 0.6358 - val_loss: 0.6496 - val_acc: 0.6366
Epoch 3/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6470 - acc: 0.6611 - val_loss: 0.6376 - val_acc: 0.6619
Epoch 4/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6411 - acc: 0.6760 - val_loss: 0.6343 - val_acc: 0.6866
Epoch 5/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6380 - acc: 0.6795 - val_loss: 0.6325 - val_acc: 0.6954
Epoch 6/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6354 - acc: 0.6892 - val_loss: 0.6300 - val_acc: 0.6982
Epoch 7/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6345 - acc: 0.6918 - val_loss: 0.6275 - val_acc: 0.6800
Epoch 8/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6326 - acc: 0.6930 - val_loss: 0.6254 - val_acc: 0.6773
Epoch 9/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6309 - acc: 0.6984 - val_loss: 0.6234 - val_acc: 0.7037
Epoch 10/10
7274/7274 [==============================] - 16s 2ms/step - loss: 0.6288 - acc: 0.7000 - val_loss: 0.6208 - val_acc: 0.6976

根据验证数据集进行预测

new_text = np.squeeze(X_val)

with tf.Session() as session:
  K.set_session(session)
  session.run(tf.global_variables_initializer())
  session.run(tf.tables_initializer())
  predict_proba = model.predict(new_text, batch_size=128)
  prediction = np.where(predict_proba > 0.50, 1, 0)    

print(classification_report(y_val,np.ravel(prediction)))

输出

Exception ignored in: <bound method BaseSession._Callable.__del__ of <tensorflow.python.client.session.BaseSession._Callable object at 0x7f737c5ff940>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1473, in __del__
    self._session._session, self._handle)
tensorflow.python.framework.errors_impl.CancelledError: (None, None, 'Session has been closed.')

              precision    recall  f1-score   support

           0       0.00      0.00      0.00      2340
           1       0.40      1.00      0.57      1557

    accuracy                           0.40      3897
   macro avg       0.20      0.50      0.29      3897
weighted avg       0.16      0.40      0.23      3897

Unique pred [1]
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
  'precision', 'predicted', average, warn_for)

0 个答案:

没有答案