如何在以下数据上实施Naive Bayes?

时间:2017-06-13 08:35:17

标签: python pandas numpy scikit-learn anaconda

我的数据位于以下格式的csv文件中:

45,45,34,34,34,56,52,88,50,46,46,1

28,26,23,22,32,36,21,18,8,28,40,0

28,46,57,42,46,51,48,48,40,46,34,1

11,11,11,34,17,13,11,46,11,33,40,0

42,36,46,32,28,51,48,56,38,46,40,1

等等。

我正在尝试使用二进制分类器,它可以对输入的数据进行分类,如前11列所示,第12列表示接受(1)或拒绝(0)。我正在使用python的pandas,numpy模块。如何在数据上实现朴素贝叶斯?

我收到了数据转换错误:

 ValueError: could not convert string to float

到目前为止,这是我的代码:

import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

df = pd.read_csv(r'file.csv')
features=df.values[:,:11]
target=df.values[:,12]

features_train, features_test, target_train, target_test = 
train_test_split(features, target, test_size = 0.33, random_state = 10)

clf=GaussianNB()
clf.fit(features_train, target_train)
target_pred = clf.predict(features_test)

1 个答案:

答案 0 :(得分:0)

您正在错误地阅读csv,您需要执行以下操作:

df = pd.read_csv(r'file.csv', skipinitialspace=True, header=None)

逗号分隔符之间有空格,也没有标题行,这将产生:

Out[18]: 
   0   1   2   3   4   5   6   7   8   9   10  11
0  45  45  34  34  34  56  52  88  50  46  46   1
1  28  26  23  22  32  36  21  18   8  28  40   0
2  28  46  57  42  46  51  48  48  40  46  34   1

,dtypes现在是数字:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 12 columns):
0     3 non-null int64
1     3 non-null int64
2     3 non-null int64
3     3 non-null int64
4     3 non-null int64
5     3 non-null int64
6     3 non-null int64
7     3 non-null int64
8     3 non-null int64
9     3 non-null int64
10    3 non-null int64
11    3 non-null int64
dtypes: int64(12)
memory usage: 368.0 bytes