在数据集上拟合决策树分类器时发生ValueError

时间:2018-11-17 13:41:54

标签: python machine-learning scikit-learn random-forest

我为正在处理的数据集创建了特征X和标签y。

这时,我想在其上训练随机森林分类器,但在将分类器拟合到训练数据上时遇到了ValueError:setting an array element with a sequence.

在X和y功能以及错误详细信息下面:

X:

(array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-0.00050612, -0.00057967, -0.00035985, ...,  0.        ,
         0.        ,  0.        ], dtype=float32),
 array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
         3.1678758e-06, -2.4535689e-06,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         6.9306935e-07, -6.6020442e-07,  0.0000000e+00], dtype=float32),
 array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
         8.83421380e-05,  4.97258679e-06,  0.00000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 2.3406714e-05,  3.1186773e-05,  4.9467826e-06, ...,
         1.2180173e-07, -9.2944845e-08,  0.0000000e+00], dtype=float32),
 array([ 1.1845550e-06, -1.6399191e-06,  2.5565218e-06, ...,
        -8.7445065e-09,  5.9859917e-09,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-1.3284328e-05, -7.4090644e-07,  7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
         5.0694009e-08, -3.4546797e-08,  0.0000000e+00], dtype=float32),
 array([ 1.5591205e-07, -1.5845627e-07,  1.5362870e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
        8.2463991e-09, 0.0000000e+00], dtype=float32),
 array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
        -1.9935460e-05, -3.4417746e-05,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.5319534e-07,  2.6521766e-07,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.5055220e-08,  1.2936166e-08,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 1.3387315e-05,  6.0913658e-07, -5.6471418e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 1.7200684e-02,  3.2272514e-02,  3.2961801e-02, ...,
        -1.6286784e-06, -8.5592075e-07,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -3.3923173e-11,  2.8026699e-11,  0.0000000e+00], dtype=float32),
 array([-0.00103188, -0.00075814, -0.00051426, ...,  0.        ,
         0.        ,  0.        ], dtype=float32),
 array([ 7.6278877e-07,  2.1624428e-05,  1.1150542e-05, ...,
         1.8263392e-09, -1.5558380e-09,  0.0000000e+00], dtype=float32),
 array([-1.2111740e-07,  6.3130176e-07, -1.8378003e-06, ...,
         1.1309878e-05,  5.4562256e-06,  0.0000000e+00], dtype=float32),
 array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
        0.        ], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -7.8796054e-09,  1.7431153e-08,  0.0000000e+00], dtype=float32),
 array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
 array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 2.2051008e-05,  1.6838792e-05,  3.5639907e-05, ...,
         4.5767497e-06, -1.2002213e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.0104826e-10,  1.6824393e-10,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -4.8303300e-06, -1.2008861e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -2.7673337e-07,  2.8604177e-07,  0.0000000e+00], dtype=float32),
 array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
        -0.0017666 ,  0.        ], dtype=float32),
 array([ 3.2218946e-11, -5.5296181e-11,  8.9530647e-11, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-1.3284328e-05, -7.4090644e-07,  7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 4.9886359e-05,  1.4642075e-04,  4.4365996e-04, ...,
         6.3584002e-07, -6.2395281e-07,  0.0000000e+00], dtype=float32),
 array([-3.2826196e-04,  4.5522624e-03, -8.2306744e-04, ...,
        -2.2519816e-07, -6.2417300e-08,  0.0000000e+00], dtype=float32),
 array([ 3.1686827e-04,  4.6282235e-04,  1.0160641e-04, ...,
        -1.4605960e-05,  6.6572487e-05,  0.0000000e+00], dtype=float32),
 array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00, ...,
        -7.1763244e-09, -2.8297892e-08,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-2.5870585e-07,  4.6514080e-07, -9.5607948e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 5.788035e-07, -6.493598e-07,  7.111379e-07, ...,  0.000000e+00,
         0.000000e+00,  0.000000e+00], dtype=float32),
 array([ 2.5118000e-04,  1.4220485e-03,  3.9536849e-04, ...,
         4.5242754e-04, -3.1405249e-05,  0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([ 1.1985266e-07,  2.1360799e-07, -1.1951373e-06, ...,
        -1.3043609e-04,  1.2107374e-06,  0.0000000e+00], dtype=float32),
 array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
        1.2123945e-07, 0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
 array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
        -1.0113516e-11,  5.1403621e-12,  0.0000000e+00], dtype=float32),
 array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00], dtype=float32),
 array([ 1.3284328e-05,  7.4090644e-07, -7.2679302e-07, ...,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype=float32),
 array([ 2.4700081e-05,  2.9454704e-05,  8.0751715e-06, ...,
         1.2746801e-07, -1.6574201e-06,  0.0000000e+00], dtype=float32),
 array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
        4.0220186e-10, 0.0000000e+00], dtype=float32),
 array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))

y下方

('08',
 '08',
 '06',
 '05',
 '05',
 '04',
 '06',
 '07',
 '01',
 '04',
 '03',
 '07',
 '03',
 '01',
 '03',
 '03',
 '02',
 '02',
 '02',
 '02',
 '05',
 '06',
 '04',
 '08',
 '07',
 '06',
 '04',
 '05',
 '07',
 '02',
 '08',
 '01',
 '08',
 '03',
 '08',
 '02',
 '03',
 '06',
 '04',
 '07',
 '04',
 '07',
 '05',
 '06',
 '08',
 '08',
 '04',
 '05',
 '05',
 '04',
 '06',
 '07',
 '05',
 '07',
 '01',
 '06',
 '02',
 '02',
 '03',
 '03')

分类器代码以及训练/测试分组:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)

错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
      1 from sklearn.tree import DecisionTreeClassifier
      2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    788             sample_weight=sample_weight,
    789             check_input=check_input,
--> 790             X_idx_sorted=X_idx_sorted)
    791         return self
    792 

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    114         random_state = check_random_state(self.random_state)
    115         if check_input:
--> 116             X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    117             y = check_array(y, ensure_2d=False, dtype=None)
    118             if issparse(X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

ValueError: setting an array element with a sequence.

EDIT1 :我将X和y都转换为numpy数组,但收到的错误是相同的,详细信息如下

import numpy as np
X = np.asarray(X)
y = np.asarray(y)


X.shape, y.shape

输出:

((60,), (60,))

1 个答案:

答案 0 :(得分:1)

问题似乎出在您的X。构成它的数组之一长度可能不同,这会导致您生成的元组,并且在由DecisionTreeClassifier处理时,由Scikit-learn转换为Numpy数组。 ,以转换为字符串向量,而决策树函数不希望这些向量处理。

只需检查以下代码段即可:

X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00], dtype='float32'),
 array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
 array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "\nX2:", np.array(X2).dtype)

只需更改X2的第二个元素,再加上一个数字,就会使X2数组变成字符串数组(对象类型)。