表格数据自动编码器的高 mse 损失

时间:2021-04-26 17:02:36

标签: python tensorflow keras neural-network autoencoder

我正在尝试在欺诈检测数据集 (https://www.kaggle.com/kartik2112/fraud-detection?select=fraudTest.csv) 上运行自动编码器以进行降维,并且每次迭代都会收到非常高的损失值。下面是自动编码器代码。

nb_epoch = 100
batch_size = 128
input_dim = X_train.shape[1]
encoding_dim = 14
hidden_dim = int(encoding_dim / 2)
learning_rate = 1e-7

input_layer = Input(shape=(input_dim, ))
encoder = Dense(encoding_dim, activation="tanh", activity_regularizer=regularizers.l1(learning_rate))(input_layer)
encoder = Dense(hidden_dim, activation="relu")(encoder)
decoder = Dense(hidden_dim, activation='tanh')(encoder)
decoder = Dense(input_dim, activation='relu')(decoder)
autoencoder = Model(inputs=input_layer, outputs=decoder)


autoencoder.compile(metrics=['accuracy'],
                    loss='mean_squared_error',
                    optimizer='adam')

cp = ModelCheckpoint(filepath="autoencoder_fraud.h5",
                               save_best_only=True,
                               verbose=0)

tb = TensorBoard(log_dir='./logs',
                histogram_freq=0,
                write_graph=True,
                write_images=True)

history = autoencoder.fit(X_train, X_train,
                    epochs=nb_epoch,
                    batch_size=batch_size,
                    shuffle=True,
                    validation_data=(X_test, X_test),
                    verbose=1,
                    callbacks=[cp, tb]).history

这是损失值的片段。

Epoch 1/100
10131/10131 [==============================] - 32s 3ms/step - loss: 52445827358.6230 - accuracy: 0.3389 - val_loss: 9625651200.0000 - val_accuracy: 0.5083
Epoch 2/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52393605025.8066 - accuracy: 0.5083 - val_loss: 9621398528.0000 - val_accuracy: 0.5083
Epoch 3/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52486496629.1354 - accuracy: 0.5082 - val_loss: 9617147904.0000 - val_accuracy: 0.5083
Epoch 4/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52514002255.9432 - accuracy: 0.5070 - val_loss: 9612887040.0000 - val_accuracy: 0.5083
Epoch 5/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52436489238.6388 - accuracy: 0.5076 - val_loss: 9608664064.0000 - val_accuracy: 0.5083
Epoch 6/100
10131/10131 [==============================] - 31s 3ms/step - loss: 52430005774.7556 - accuracy: 0.5081 - val_loss: 9604417536.0000 - val_accuracy: 0.5083
Epoch 7/100
10131/10131 [==============================] - 31s 3ms/step - loss: 52474495714.5898 - accuracy: 0.5079 - val_loss: 9600195584.0000 - val_accuracy: 0.5083
Epoch 8/100
10131/10131 [==============================] - 31s 3ms/step - loss: 52423052560.0695 - accuracy: 0.5076 - val_loss: 9595947008.0000 - val_accuracy: 0.5083
Epoch 9/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52442358260.0742 - accuracy: 0.5072 - val_loss: 9591708672.0000 - val_accuracy: 0.5083
Epoch 10/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52402494704.5369 - accuracy: 0.5089 - val_loss: 9587487744.0000 - val_accuracy: 0.5083
Epoch 11/100
10131/10131 [==============================] - 31s 3ms/step - loss: 52396583628.3553 - accuracy: 0.5081 - val_loss: 9583238144.0000 - val_accuracy: 0.5083
Epoch 12/100
10131/10131 [==============================] - 31s 3ms/step - loss: 52349824708.2700 - accuracy: 0.5076 - val_loss: 9579020288.0000 - val_accuracy: 0.5083
Epoch 13/100
10131/10131 [==============================] - 31s 3ms/step - loss: 52332072133.6850 - accuracy: 0.5083 - val_loss: 9574786048.0000 - val_accuracy: 0.5083
Epoch 14/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52353680011.6731 - accuracy: 0.5086 - val_loss: 9570555904.0000 - val_accuracy: 0.5083
Epoch 15/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52347432594.5456 - accuracy: 0.5088 - val_loss: 9566344192.0000 - val_accuracy: 0.5083
Epoch 16/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52327825554.3435 - accuracy: 0.5076 - val_loss: 9562103808.0000 - val_accuracy: 0.5083
Epoch 17/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52347251610.1255 - accuracy: 0.5080 - val_loss: 9557892096.0000 - val_accuracy: 0.5083
Epoch 18/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52292632667.3636 - accuracy: 0.5079 - val_loss: 9553654784.0000 - val_accuracy: 0.5083
Epoch 19/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52354135093.7671 - accuracy: 0.5083 - val_loss: 9549425664.0000 - val_accuracy: 0.5083
Epoch 20/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52295668148.2006 - accuracy: 0.5086 - val_loss: 9545219072.0000 - val_accuracy: 0.5083
Epoch 21/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52314219115.3320 - accuracy: 0.5079 - val_loss: 9540980736.0000 - val_accuracy: 0.5083
Epoch 22/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52328022934.0829 - accuracy: 0.5079 - val_loss: 9536788480.0000 - val_accuracy: 0.5083
Epoch 23/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52268139834.5172 - accuracy: 0.5074 - val_loss: 9532554240.0000 - val_accuracy: 0.5083
Epoch 24/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52308370726.3040 - accuracy: 0.5077 - val_loss: 9528341504.0000 - val_accuracy: 0.5083
Epoch 25/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52224468101.4070 - accuracy: 0.5081 - val_loss: 9524126720.0000 - val_accuracy: 0.5083
Epoch 26/100
10131/10131 [==============================] - 30s 3ms/step - loss: 52200100823.1694 - accuracy: 0.5080 - val_loss: 9519915008.0000 - val_accuracy: 0.5083

任何建议/解决方案将不胜感激。谢谢

1 个答案:

答案 0 :(得分:-1)

<块引用>

我已经使用 StandardScaler 缩放了数值数据并进行了编码 使用 LabelEncoder 的分类数据

首先,检查您缩放的数字数据。 我认为您错误地缩放了 cc_num,因为 cc_num 是一个分类列。

这应该可以解决您的高损失问题,但这并不意味着您的模型会很好。 您应该首先对特征进行良好的检查,并尝试在标签和特征之间获得一些有用的关系(数据预处理/特征化)