我正在尝试使用Isolation forest作为python 2.7 anaconda框架中的分类器,这是我的示例代码。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
rng = np.random.RandomState(42)
import pandas
from pandas import read_csv
from numpy import set_printoptions
filename1 = 'path/Cleanedinput.csv'
dataframe1 = read_csv(filename, names=names,low_memory=False)
Xtrain = dataframe1.values
Xtrain.shape
(996405L, 16L)
Xtrain[0:2]
array([[1744121620.0, 2590000000.0, '44846', '39770', '6', '100', 1L, '5', '290', 60L, '1', 1L, '-6', '46846', 12.9833, 77.5833],
[1724121520.0, 2260000000.0, '12337', '31772', '6', '100', 1L, '1', '54', 60L, '1', 1L, '-6', '41637', 23.4833, 24.123]], dtype=object)
clf = IsolationForest(max_samples=10, random_state=rng)
clf.fit(X_train)
我的Xtrian阵列看起来像
array([[1744121620.0, 2590000000.0, '44846', '39770', '6', '100', 1L, '5', '290', 60L, '1', 1L, '-6', '46846', 12.9833, 77.5833],
[1724121520.0, 2260000000.0, '12337', '31772', '6', '100', 1L, '1', '54', 60L, '1', 1L, '-6', '41637', 23.4833, 24.123]], dtype=object)
但我得到了价值错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-0a80fca9c379> in <module>()
----> 1 clf.fit(X_train)
C:\Anaconda\lib\site-packages\sklearn\ensemble\iforest.pyc in fit(self, X, y, sample_weight)
157 # ensure_2d=False because there are actually unit test checking we fail
158 # for 1d.
--> 159 X = check_array(X, accept_sparse=['csc'], ensure_2d=False)
160 if issparse(X):
161 # Pre-sort indices to avoid that each individual tree of the
C:\Anaconda\lib\site-packages\sklearn\utils\validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
380 force_all_finite)
381 else:
--> 382 array = np.array(array, dtype=dtype, order=order, copy=copy)
383
384 if ensure_2d:
ValueError: could not convert string to float: -
在数据类型
方面是否缺少某些内容答案 0 :(得分:0)
您拥有的Xtrain
变量中的部分数据被表示为Strings
而不是numerical
值。
在您提供的Xtrain
array([[1744121620.0, 2590000000.0, '44846', '39770', '6', '100', 1L, '5', '290', 60L, '1', 1L, '-6', '46846', 12.9833, 77.5833], [1724121520.0, 2260000000.0, '12337', '31772', '6', '100', 1L, '1', '54', 60L, '1', 1L, '-6', '41637', 23.4833, 24.123]], dtype=object)
'44846' , '39770 ..etc
是一个字符串值。
查看此dtype
的{{1}},其Xtrain
,将dtype转换为object
,它应该有效。