我正在使用Sci-Kit测试一个朴素的贝叶斯分类器,并使用Numpy将数据存储在数组中。我的数据来自.csv文件,包含一个标题行,四列包含字符串,一列包含数值。以下是csv文件的结构:
columnOne columnTwo columnThree columnFour columnFive
string string string numeric string
这是我的代码:
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
import numpy as np
import csv
with open('data.csv', 'rU') as csvfile:
location_reader = csv.reader(csvfile, delimiter =',',
quotechar ='"')
# Header contains feature names
row = location_reader.next()
feature_names = np.array(row)
# Load dataset, and target classes
location_X, location_y = [], []
for row in location_reader:
location_X.append(row)
location_y.append(row[4]) # The target value is column five
location_X = np.array(location_X)
location_y = np.array(location_y)
model = GaussianNB()
model.fit(location_X, location_y)
print(model)
# make predictions
expected = location_y
predicted = model.predict(location_X)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
任何人都可以提供一些关于我可能会收到此错误以及如何解决此错误的信息吗?