我正在尝试将一些(numpy)数据放入python skLearn模块中,但始终收到错误消息。
当我使用虹膜中的示例数据集时,我将其按以下方式加载
from sklearn import datasets
iris = datasets.load_diabetes() # load pseudo test data
print(np.shape(iris.data))
print(np.shape(iris.target))
(442, 10)
(442,)
工作正常。但是,当我使用自己的数据集并将其转换为numpy数组时,它将失败。我不知道为什么,因为我已经将其明确转换为与虹膜相同的形状类型
fileLoc = 'C:\\Users\\2018_signal.csv'
data = pd.read_csv(fileLoc)
fl_data = data[['signal', 'sig_dig', 'std_prx']].values
fl_target = data[['actual']].actual.values
ml_data = fl_data[0:int(fraction * len(fl_data))]
ml_target = fl_target[0:int(fraction * len(fl_target))]
print(np.shape(ml_data))
print(np.shape(ml_target))
(6663, 3)
(6663,)
如下所示的skLearn代码
start_time = time.time()
SKknn_pred = KNeighborsClassifier(n_neighbors=1, algorithm='ball_tree', metric = 'euclidean').fit(ml_data, ml_target).predict(ml_data)
print("knn --- %s seconds ---" % (time.time() - start_time))
print("Number of mislabeled points out of a total %d points : %d" % (fl_data.shape[0],(fl_target != SKknn_pred).sum()))
l_time.append(['knn', 1000 * (time.time() - start_time)])
我在下面收到错误消息...帮助!!!!
ValueError Traceback (most recent call last)
<ipython-input-96-91e2b93e2580> in <module>()
57
58 start_time = time.time()
---> 59 SKgnb_pred = GaussianNB().fit(ml_data, ml_target).predict(fl_data)
60 print("gnb --- %s seconds ---" % (time.time() - start_time))
61 print("Number of mislabeled points out of a total %d points : %d" % (fl_data.shape[0],(fl_target != SKgnb_pred).sum()))
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in fit(self, X, y, sample_weight)
183 X, y = check_X_y(X, y)
184 return self._partial_fit(X, y, np.unique(y), _refit=True,
--> 185 sample_weight=sample_weight)
186
187 @staticmethod
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\naive_bayes.py in _partial_fit(self, X, y, classes, _refit, sample_weight)
348 self.classes_ = None
349
--> 350 if _check_partial_fit_first_call(self, classes):
351 # This is the first call to partial_fit:
352 # initialize various cumulative counters
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in _check_partial_fit_first_call(clf, classes)
319 else:
320 # This is the first call to partial_fit
--> 321 clf.classes_ = unique_labels(classes)
322 return True
323
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in unique_labels(*ys)
95 _unique_labels = _FN_UNIQUE_LABELS.get(label_type, None)
96 if not _unique_labels:
---> 97 raise ValueError("Unknown label type: %s" % repr(ys))
98
99 ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))
ValueError: Unknown label type: (array([-78.375, -67.625, -66.75 , ..., 71.375, 76.75 , 78.1 ]),)
答案 0 :(得分:0)
使用python纠正您自己的错误的方法。
from sklearn import preprocessing
from sklearn import utils
ml_target = lab_enc.fit_transform(ml_target)
print(utils.multiclass.type_of_target(ml_target))
print(utils.multiclass.type_of_target(ml_target.astype('float')))
print(utils.multiclass.type_of_target(ml_target))
上面的转换后,skLearn模块适合数据