Question

我有一个16列和100,000行的数据集，我正准备为矩阵分解培训做准备。我正在使用以下代码将其拆分并将其转换为稀疏矩阵。

X=data.drop([data.columns[0]],axis='columns')
y=data[[1]]
X=lil_matrix(100000,15).astype('float32')
y=np.array(y).astype('float32')
X

但是当我运行它时，出现此错误：

<1x1个类型为''的稀疏矩阵，已存储1 LInked List格式的元素>。

当我尝试将其插入训练/测试组时，会给我带来更多错误：

找到输入样本数量不一致的输入变量：[1， 100000]

Answer 1

您链接的notebook正在创建一个“空白”稀疏矩阵，并从从csv读取的数据中设置选定的元素。

一个简单的例子：

In [565]: from scipy import sparse                                                                           
In [566]: M = sparse.lil_matrix((10,5), dtype=float)                                                         
In [567]: M                                                                                                  
Out[567]: 
<10x5 sparse matrix of type '<class 'numpy.float64'>'
    with 0 stored elements in LInked List format>

请注意，我使用(10,5)指定矩阵形状。（）很重要！这就是为什么我强调阅读docs。在链接中，相关行为：

X = lil_matrix((lines, columns)).astype('float32')

现在我可以设置几个元素，就像设置密集数组一样：

In [568]: M[1,2] = 12.3                                                                                      
In [569]: M[3,1] = 1.1                                                                                       
In [570]: M                                                                                                  
Out[570]: 
<10x5 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in LInked List format>

我可以使用toarray将矩阵显示为密集阵列（不要尝试使用较大尺寸的矩阵）。

In [571]: M.toarray()                                                                                        
Out[571]: 
array([[ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. , 12.3,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  1.1,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ],
       [ 0. ,  0. ,  0. ,  0. ,  0. ]])

如果我省略（），它将组成一个只有一个元素（第一个数字）的（1,1）矩阵。

In [572]: sparse.lil_matrix(10,5)                                                                            
Out[572]: 
<1x1 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in LInked List format>
In [573]: _.A                                                                                                
Out[573]: array([[10]], dtype=int64)

再次查看您的代码。一旦X值是一个数据帧，则将其设置两次。第二次是错误的lil初始化。第二次没有使用第一个X。

X=data.drop([data.columns[0]],axis='columns')
...
X=lil_matrix(100000,15).astype('float32')

如何将矩阵转换为稀疏矩阵并对其进行原型处理

1 个答案: