Question

我正在尝试为二进制分类构建一个randomforest分类器。我的数据是不平衡的，因此我正在进行欠采样。

train = data.drop(['Co_Name','Cust_ID','Phone','Shpr_ID','Resi_Cnt','Buz_Cnt','Nearby_Cnt','parseNumber','removeString','Qty','bins','Adj_Addr','Resi','Weight','Resi_Area','Lat','Lng'], axis=1)
Y = data['Resi']
from sklearn import metrics
rus = RandomUnderSampler(random_state=42)
X_train_res, y_train_res = rus.fit_sample(train, Y)

我收到以下错误

446         # make sure we actually converted to numeric:
    447         if dtype_numeric and array.dtype.kind == "O":
--> 448             array = array.astype(np.float64)
    449         if not allow_nd and array.ndim >= 3:
    450             raise ValueError("Found array with dim %d. %s expected <= 2."

ValueError: setting an array element with a sequence.

如何解决此问题。

Answer 1

你能分享数据帧吗？或者那样的样本！

此错误可能包含很多内容，例如：

如果您尝试：

np.asarray（ [ [1,2]， [2,3,4] ] dtype = np.float）

你会得到：

ValueError: setting an array element with a sequence.

这是因为数组的列形状不正确。因此，您无法从列表创建数组，第二个列表中的列长度不同。因此不匹配列长度。

但是你的错误可能与列车与Y形状或列车中的类型（数据）有关。在欠采样拟合函数期间，应该有一些转换会抛出此错误。在执行RandomUnderSampler之前，请确认列车（数据）是否具有相应的类型。

在为Sklearn执行欠采样时出错

1 个答案: