我正在尝试在数据集上应用Gaussian Naive Bayes
模型来预测疾病。当我预测使用训练数据时它正常运行,但是当我试图预测使用测试数据时,它正在给ValueError
。
runfile('D:/ ROFI / ML / Heart Disease / prediction.py',wdir ='D:/ ROFI / ML / Heart Disease') 回溯(最近一次调用最后一次):
文件“”,第1行,in runfile('D:/ ROFI / ML / Heart Disease / prediction.py',wdir ='D:/ ROFI / ML / Heart Disease')
文件“C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py”,第866行,在runfile中 execfile(filename,namespace)
文件“C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py”,第102行,在execfile中 exec(compile(f.read(),filename,'exec'),namespace)
文件“D:/ ROFI / ML / Heart Disease / prediction.py”,第85行,in 预测(x_train,y_train,x_test,y_test)
文件“D:/ ROFI / ML /心脏病/ prediction.py”,第73行,预测 predict_data = model.predict(x_test)
文件“C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ sklearn \ naive_bayes.py”,第65行,预测 jll = self._joint_log_likelihood(X)
文件“C:\ Users \ User \ Anaconda3 \ lib \ site-packages \ sklearn \ naive_bayes.py”,第429行,_joint_log_liklike n_ij - = 0.5 * np.sum(((X - self.theta_ [i,:])** 2)/
ValueError:操作数无法与形状(294,14)(15,)
一起广播
这里有什么问题?
import pandas
from sklearn import metrics
from sklearn.preprocessing import Imputer
from sklearn.naive_bayes import GaussianNB
def load_data(feature_columns, predicted_column):
train_data_frame = pandas.read_excel("training_data.xlsx")
test_data_frame = pandas.read_excel("testing_data.xlsx")
data_frame = pandas.read_excel("data_set.xlsx")
x_train = train_data_frame[feature_columns].values
y_train = train_data_frame[predicted_column].values
x_test = test_data_frame[feature_columns].values
y_test = test_data_frame[predicted_column].values
x_train, x_test = impute(x_train, x_test)
return x_train, y_train, x_test, y_test
def impute(x_train, x_test):
fill_missing = Imputer(missing_values=-9, strategy="mean", axis=0)
x_train = fill_missing.fit_transform(x_train)
x_test = fill_missing.fit_transform(x_test)
return x_train, x_test
def predict(x_train, y_train, x_test, y_test):
model = GaussianNB()
model.fit(x_train, y_train.ravel())
predicted_data = model.predict(x_test)
accuracy = metrics.accuracy_score(y_test, predicted_data)
print("Accuracy of our naive bayes model is : %.2f"%(accuracy * 100))
return predicted_data
feature_columns = ["age", "sex", "chol", "cigs", "years", "fbs", "trestbps", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "num"]
predicted_column = ["cp"]
x_train, y_train, x_test, y_test = load_data(feature_columns, predicted_column)
predict(x_train, y_train, x_test, y_test)
N.B:两个文件都有相同的列数。
答案 0 :(得分:1)
我发现了这个错误。由于Imputer
而发生错误。 Imputer
替换数据集中的缺失值。但是,如果任何列完全由缺失值组成,则它会删除该列。我在测试数据集中有一个完整的缺失数据列。因此,Imputer
正在删除它,因此形状与训练数据不匹配,这就是错误的原因。刚刚从feature_columns
列表中删除了列名,该列表中包含缺失值并且有效。