我的数据集具有指数形式的值,如下所示:
2.15E-06 -0.000556462 0.000197385 -0.000919 -0.000578077....
这是代码:
####----Data----####
quest=pd.read_csv("inputFile.csv", names=["A1","A2",..."A200","Sim"])
print(quest.head())
####----Set up Data and Label----####
X=quest.drop('Sim',axis=1)
y=quest['Sim']
####----Train Test Split----####
X_train, X_test, y_train, y_test = train_test_split(X, y)
np.isfinite(X_train).any(), np.isfinite(y_train).any(),np.isfinite(X_test).any()
np.isnan(X_train).any(), np.isnan(y_train).any(), np.isnan(X_test).any()
####----Data Pre-Processing----####
scaler=StandardScaler()
# Fit only to the training data
X_scaled=scaler.fit(X_train)
# Now apply the transformations to the data:
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
####----Training the Model----####
mlp=MLPClassifier(hidden_layer_sizes=(13,13,13), max_iter=500)
mlp.fit(X_train,y_train)
print(mlp)
####----Predictions and Evaluation----####
predictions=mlp.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))
我收到了这个错误:
追踪(最近一次通话): 文件“E:\ thesis \ sk-ANN.py”,第67行,
X_scaled=scaler.fit(X_train)...
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
在拟合和预测模型时出现相同的错误“mlp.fit(X_train,y_train”和“predictions = mlp.predict(X_test)”
如何删除此错误?
答案 0 :(得分:0)
我认为您需要预处理数据集。您可以将Nan值替换为一些有意义的值。这里可以找到一些有用的答案-numpy array: replace nan values with average of columns