Question

我正在使用scikit模型（即ExtraTreesRegressor），以便进行监督功能选择。

我做了一个玩具示例，以尽可能清晰。这是玩具代码：

import pandas as pd
import numpy as np
from  sklearn.ensemble import ExtraTreesRegressor
from itertools import chain

# Original Dataframe
df = pd.DataFrame({"A": [[10,15,12,14],[20,30,10,43]], "R":[2,2] ,"C":[2,2] , "CLASS":[1,0]})
X = np.array([np.array(df.A).reshape(1,4) , df.C , df.R])
Y = np.array(df.CLASS)

# prints
X = np.array([np.array(df.A), df.C , df.R])
Y = np.array(df.CLASS)

print("X",X)
print("Y",Y) 
print(df)
df['A'].apply(lambda x: print("ORIGINAL SHAPE",np.array(x).shape,"field:",x))
df['A'] = df['A'].apply(lambda x: np.array(x).reshape(4,1),"field:",x)
df['A'].apply(lambda x: print("RESHAPED SHAPE",np.array(x).shape,"field:",x))
model = ExtraTreesRegressor()
model.fit(X,Y)
model.feature_importances_

X [[[10, 15, 12, 14] [20, 30, 10, 43]]
 [2 2]
 [2 2]]

Y [1 0]

                   A  C  CLASS  R
0  [10, 15, 12, 14]  2      1  2
1  [20, 30, 10, 43]  2      0  2
ORIGINAL SHAPE (4,) field: [10, 15, 12, 14]
ORIGINAL SHAPE (4,) field: [20, 30, 10, 43]
---------------------------

这是出现的异常：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-5a36c4c17ea0> in <module>()
      7 print(df)
      8 model = ExtraTreesRegressor()
----> 9 model.fit(X,Y)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
    210         """
    211         # Validate or convert input data
--> 212         X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    213         if issparse(X):
    214             # Pre-sort indices to avoid that each individual tree of the

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    371                                       force_all_finite)
    372     else:
--> 373         array = np.array(array, dtype=dtype, order=order, copy=copy)
    374 
    375         if ensure_2d:

ValueError: setting an array element with a sequence.

我注意到这涉及到np.arrays。所以我试图安装另一个玩具数据帧，这是最基本的，只有标量并且没有出现错误。我试图保留相同的代码，只是通过添加包含单维数组的另一个字段来修改相同的玩具数据框，现在出现了相同的异常。

我环顾四周，但到目前为止，我还没有找到解决方案，甚至试图进行一些重塑，转换成列表，np.array等，并在我的真实问题中进行矩阵化。现在我一直在朝着这个方向努力。

我也看到，当样本之间存在不同长度的数组时，通常会出现这种问题，但这不是玩具示例的情况。

任何人都知道如何处理这种结构/异常？提前感谢您的帮助。

Answer 1

转换熊猫＆＃39; DataFrame到NumPy的矩阵，

import pandas as pd

def df2mat(df):
    a = df.as_matrix()
    n = a.shape[0]
    m = len(a[0])
    b = np.zeros((n,m))
    for i in range(n):
        for j in range(m):
            b[i,j]=a[i][j]
return b

df = pd.DataFrame({"A":[[1,2],[3,4]]})
b = df2mat(df.A)

之后，连接。

Answer 2

仔细看看你的X：

>>> X
array([[[10, 15, 12, 14], [20, 30, 10, 43]],
       [2, 2],
       [2, 2]], dtype=object)
>>> type(X[0,0])
<class 'list'>

请注意它是dtype=object，其中一个对象是list，因此＆＃34;设置带序列的数组元素。部分问题是np.array(df.A)无法正确创建2D数组：

>>> np.array(df.A)
array([[10, 15, 12, 14], [20, 30, 10, 43]], dtype=object)
>>> _.shape
(2,)  # oops!

但使用np.stack(df.A)可以解决问题。

您在寻找：

>>> X = np.concatenate([
        np.stack(df.A),                 # condense A to (N, 4)
        np.expand_dims(df.C, axis=-1),  # expand C to (N, 1)
        np.expand_dims(df.R, axis=-1),  # expand R to (N, 1)
        axis=-1
    )
>>> X
array([[10, 15, 12, 14,  2,  2],
       [20, 30, 10, 43,  2,  2]], dtype=int64)

value error：使用序列设置数组元素

2 个答案: