我有一些数据,想要分类。
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2474 entries, 0 to 5961
Data columns (total 4 columns):
Age 2474 non-null int64
Pre_Hospitalization_Disposal 2474 non-null object
Injury_to_hospital_time 2474 non-null float64
Discharge_results 2474 non-null int64
dtypes: float64(1), int64(2), object(1)
memory usage: 96.6+ KB
年龄,住院前处置,伤害到医院时间是特征数据。
Discharge_results要预测。
我检查了我的数据不为空。
print(len(DataSet.index[(pd.isnull(DataSet['Age'])) |
(pd.isnull(DataSet['Pre_Hospitalization_Disposal'])) |
(pd.isnull(DataSet['Injury_to_hospital_time'])) |
(pd.isnull(DataSet['Discharge_results']))]))
我的代码:
(train, test) = train_test_split(DataSet, test_size=0.2, random_state=42)
trainY = train["Discharge_results"].astype('float')
testY = test["Discharge_results"].astype('float')
cs = MinMaxScaler()
trainContinuous = cs.fit_transform(train[['Age','Injury_to_hospital_time']])
testContinuous = cs.transform(test[['Age','Injury_to_hospital_time']])
zipBinarizer = LabelBinarizer().fit(DataSet["Pre_Hospitalization_Disposal"])
trainCategorical = zipBinarizer.transform(train["Pre_Hospitalization_Disposal"])
testCategorical = zipBinarizer.transform(test["Pre_Hospitalization_Disposal"])
trainX = np.hstack([trainCategorical, trainContinuous])
testX = np.hstack([testCategorical, testContinuous])
model = Sequential()
model.add(Dense(16, input_dim=trainX.shape[1] ,activation="relu"))
model.add(Dense(8, activation="relu"))
model.add(Dense(1, activation="softmax"))
model.compile(loss="sparse_categorical_crossentropy", optimizer='Adam')
history = model.fit(trainX, trainY, validation_data=(testX, testY),epochs=200, batch_size=32)
但是训练时我得到loss NAN
。
结果:
Train on 1979 samples, validate on 495 samples
Epoch 1/10
1979/1979 [==============================] - 2s 1ms/step - loss: nan - val_loss: nan
Epoch 2/10
1979/1979 [==============================] - 0s 165us/step - loss: nan - val_loss: nan
Epoch 3/10
1979/1979 [==============================] - 0s 139us/step - loss: nan - val_loss: nan
Epoch 4/10
1979/1979 [==============================] - 0s 137us/step - loss: nan - val_loss: nan
Epoch 5/10
1979/1979 [==============================] - 0s 137us/step - loss: nan - val_loss: nan
Epoch 6/10
1979/1979 [==============================] - 0s 141us/step - loss: nan - val_loss: nan
Epoch 7/10
1979/1979 [==============================] - 0s 138us/step - loss: nan - val_loss: nan
Epoch 8/10
1979/1979 [==============================] - 0s 141us/step - loss: nan - val_loss: nan
Epoch 9/10
1979/1979 [==============================] - 0s 140us/step - loss: nan - val_loss: nan
Epoch 10/10
1979/1979 [==============================] - 0s 144us/step - loss: nan - val_loss: nan
有人可以帮助我吗?非常感谢!
答案 0 :(得分:1)
我看来您的标签和训练损失之间不匹配。损失sparse_categorical_crossentropy
用于具有多个类别的分类模型。如果要使用此损失,则标签应为整数(正确类别的索引),但我在您的代码中看到标签为浮点数:
trainY = train["Discharge_results"].astype('float')
此外,模型的最后一个Dense层应具有n_classes
个隐藏单位,而不仅仅是1个。
如果标签确实是浮动的,则可能正在处理回归问题,应使用其他损失函数(例如mean_squared_error
)。