我正在使用sklearn构建一个朴素贝叶斯分类器: Breast Cancer Dataset。 这是代码:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
col = ['ID','a1','a2','a3','a4','a5','a6','a7','a8','a9','class']
data =pd.read_csv("data.csv",names = col,header =None)
data = data.drop('ID',axis =1)
msk = np.random.rand(len(data)) < 0.66
train = data[msk]
test = data[~msk]
y = train['class']
X = train.drop('class',axis = 1)
labels = test['class']
test = test.drop('class',axis = 1)
clf = GaussianNB()
'''
y1 = np.array(y, dtype=pd.Series)
X1 = np.array(X,dtype = pd.Series)
'''
clf.fit(X, y)
数据集分为66%训练集和33%测试集比。我使用sklearn使用高斯NB分类器。 当我运行它时,我收到以下错误:
Traceback (most recent call last):
File "<ipython-input-9-3c90e612af8d>", line 1, in <module>
runfile('C:/Users/Keshav/Desktop/Spring/ML/Project/Breast Cancer/main1a.py', wdir='C:/Users/Keshav/Desktop/Spring/ML/Project/Breast Cancer')
File "C:\Users\Keshav\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "C:/Users/Keshav/Desktop/Spring/ML/Project/Breast Cancer/main1a.py", line 37, in <module>
clf.fit(X, y)
File "C:\Users\Keshav\Anaconda\lib\site-packages\sklearn\naive_bayes.py", line 163, in fit
self.theta_[i, :] = np.mean(Xi, axis=0)
File "C:\Users\Keshav\Anaconda\lib\site-packages\numpy\core\fromnumeric.py", line 2727, in mean
out=out, keepdims=keepdims)
File "C:\Users\Keshav\Anaconda\lib\site-packages\numpy\core\_methods.py", line 69, in _mean
ret, rcount, out=ret, casting='unsafe', subok=False)
TypeError: unsupported operand type(s) for /: 'str' and 'long'
使用相同的训练集和测试集,我能够成功运行SVM(sklearn),但使用GaussianNB却不是这样。
知道怎么解决吗?