Question

我有一个数据矩阵，我存储在稀疏矩阵的scipy.sparse格式之一，以及我需要预测的一堆结果。基本上我想为每个结果拟合一个线性模型。由于数据集非常大（数万），我正在使用SGDRegressor。现在，我有我的特征矩阵：

In [62]: features
Out[62]: 
<77946x72239 sparse matrix of type '<type 'numpy.float64'>'
    with 1084093 stored elements in LInked List format>

和我的成果

In [63]: outcomes
Out[63]: 
<77946x24 sparse matrix of type '<type 'numpy.float64'>'
    with 416487 stored elements in LInked List format>

我的问题是：为了训练第一个结果的线性模型，为什么我不能这样做（参见错误）？这样做的正确方法是什么？

In [64]: reg.fit(features, outcomes[:,0])
[...]
ValueError: Shapes of X and y do not match.

Answer 1

首先，y不应该是稀疏矩阵。其次，它的形状应该是(n_samples,)而不是(1, n_samples)，所以

y = outcomes[:, 0].toarray().ravel()

应该有用。