Question

首先，我是机器学习的新手。

我试图预测二手车的价格。这车有品牌和型号，所以我使用MultiLabelBinarizer制作稀疏矩阵，处理分类属性，这里是代码：

from sklearn.preprocessing import MultiLabelBinarizer
encoder = MultiLabelBinarizer()
make_cat_1hot = encoder.fit_transform(make_cat)
model_cat_1hot = encoder.fit_transform(model_cat)
type_cat_1hot = encoder.fit_transform(type_cat)

print(type(make_cat_1hot))
carInfoModHot = carsInfoMod.copy()
carInfoModHot["makeHot"] = make_cat_1hot.tolist()
carInfoModHot["modelHot"] = model_cat_1hot.tolist()
carInfoModHot["typeHot"] = type_cat_1hot.tolist()



doors   km      make        year    makeHot                       modelHot  
5.0     78779   Mercedes    2012    [0, 0, 0,  0, 1, 0, 0, 0, ...[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, ...  
5.0     25463   Bmw         2015    [0, 1, 0, 0, 0, 0, 0, ...   [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, ...

然后我用它来做预测并用线性回归得到均方误差：

lr = linear_model.LinearRegression()

carsInfoTrainHot = carInfoModHot.drop(["price"], axis=1) # drop labels for training set

df1 = carsInfoTrainHot.iloc[:30000, :]
carsLabels1 = carsInfoMod.iloc[:30000, 3]
print(carsInfoTrainHot.head())
df2 = carsInfoTrainHot.iloc[30001:60000, :]
carsLabels2 = carsInfoMod.iloc[30001:60000, 3]
df3 = carsInfoTrainHot.iloc[60001:, :]
carsLabels3 = carsInfoMod.iloc[60001:, 3]

lr.fit(df1, carsLabels1) 
print(carsInfoTrainHot.shape)
carPrediction = lr.predict(df2)

lin_mse = mean_squared_error(carsLabels2, carPrediction)

lin_rmse = np.sqrt(lin_mse)

但是我收到了这个错误：

ValueError Traceback（最近一次调用   最后）in（）        12辆车标签3 = carsInfoMod.iloc [60001：，3]        13   ---＆GT; 14 lr.fit（df1，carsLabels1）        15打印（carsInfoTrainHot.shape）        16 carPrediction = lr.predict（df2）

/home/vagrant/anaconda3/lib/python3.6/site-packages/sklearn/linear_model/base.py   适合（自我，X，y，sample_weight）       510 n_jobs_ = self.n_jobs       511 X，y = check_X_y（X，y，accept_sparse = ['csr'，'csc'，'coo']，    - ＆GT; 512 y_numeric = True，multi_output = True）       513       514如果sample_weight不是None，则为np.atleast_1d（sample_weight）.ndim＆gt; 1：

/home/vagrant/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py   在check_X_y（X，y，accept_sparse，dtype，order，copy，   force_all_finite，ensure_2d，allow_nd，multi_output，   ensure_min_samples，ensure_min_features，y_numeric，warn_on_dtype，   估计）       519 X = check_array（X，accept_sparse，dtype，order，copy，force_all_finite，       520 ensure_2d，allow_nd，ensure_min_samples，    - ＆GT; 521 ensure_min_features，warn_on_dtype，estimator）       522 if multi_output：       523 y = check_array（y，'csr'，force_all_finite = True，ensure_2d = False，

/home/vagrant/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py   在check_array（array，accept_sparse，dtype，order，copy，   force_all_finite，ensure_2d，allow_nd，ensure_min_samples，   ensure_min_features，warn_on_dtype，estimator）       400＃确保我们实际转换为数字：       401如果dtype_numeric和array.dtype.kind ==“O”：    - ＆GT; 402 array = array.astype（np.float64）       403如果不是allow_nd和array.ndim＆gt; = 3：       404引发ValueError（“找到dim％d。％s预期的数组＆lt; = 2。”

ValueError：使用序列设置数组元素。

据我所知，我在分类属性中插入一个数组，但我怎样才能将分类值更改为稀疏矩阵？

感谢。

稀疏矩阵的分类属性

0 个答案: