我的数据位于以下格式的csv文件中:
45,45,34,34,34,56,52,88,50,46,46,1
28,26,23,22,32,36,21,18,8,28,40,0
28,46,57,42,46,51,48,48,40,46,34,1
11,11,11,34,17,13,11,46,11,33,40,0
42,36,46,32,28,51,48,56,38,46,40,1
等等。
我正在尝试使用二进制分类器,它可以对输入的数据进行分类,如前11列所示,第12列表示接受(1)或拒绝(0)。我正在使用python的pandas,numpy模块。如何在数据上实现朴素贝叶斯?
我收到了数据转换错误:
ValueError: could not convert string to float
到目前为止,这是我的代码:
import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
df = pd.read_csv(r'file.csv')
features=df.values[:,:11]
target=df.values[:,12]
features_train, features_test, target_train, target_test =
train_test_split(features, target, test_size = 0.33, random_state = 10)
clf=GaussianNB()
clf.fit(features_train, target_train)
target_pred = clf.predict(features_test)
答案 0 :(得分:0)
您正在错误地阅读csv,您需要执行以下操作:
df = pd.read_csv(r'file.csv', skipinitialspace=True, header=None)
逗号分隔符之间有空格,也没有标题行,这将产生:
Out[18]:
0 1 2 3 4 5 6 7 8 9 10 11
0 45 45 34 34 34 56 52 88 50 46 46 1
1 28 26 23 22 32 36 21 18 8 28 40 0
2 28 46 57 42 46 51 48 48 40 46 34 1
,dtypes现在是数字:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 12 columns):
0 3 non-null int64
1 3 non-null int64
2 3 non-null int64
3 3 non-null int64
4 3 non-null int64
5 3 non-null int64
6 3 non-null int64
7 3 non-null int64
8 3 non-null int64
9 3 non-null int64
10 3 non-null int64
11 3 non-null int64
dtypes: int64(12)
memory usage: 368.0 bytes