如何在文件上写scipy稀疏矩阵

时间:2017-12-09 15:59:45

标签: python numpy scipy

我有一个co scipy稀疏矩阵1000 x 12000列。 想要按照以下格式写入磁盘文件: 按行,所有非零列:

col_id1:value col_id2:value .... col_id2:value ....

有快速办法吗? (不用手动迭代)

1 个答案:

答案 0 :(得分:1)

我在评论中建议的一个例子:

In [2]: from scipy import sparse
In [3]: M = sparse.random(10,10,.2)
In [4]: M
Out[4]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 20 stored elements in COOrdinate format>
In [5]: print(M)
  (1, 9)    0.61465832998
  (8, 8)    0.894080347124
  (2, 7)    0.709001342736
  (3, 2)    0.809025517922
  (9, 8)    0.974650428753
  (7, 8)    0.495271225449
  (5, 6)    0.356408870324
  (0, 8)    0.57026318308
  (3, 6)    0.69919575217
  (5, 8)    0.226445982654
  (5, 1)    0.191857394963
  (7, 9)    0.121634028589
  (6, 6)    0.815836601813
  (7, 3)    0.585401171842
  (6, 7)    0.526762154792
  (6, 9)    0.775136319014
  (4, 1)    0.517647147906
  (0, 5)    0.484673192725
  (7, 5)    0.72827335905
  (2, 8)    0.527635893465

lil格式按行收集值:

In [6]: Ml = M.tolil()
In [7]: Ml.rows
Out[7]: 
array([list([5, 8]), list([9]), list([7, 8]), list([2, 6]), list([1]),
       list([1, 6, 8]), list([6, 7, 9]), list([3, 5, 8, 9]), list([8]),
       list([8])], dtype=object)
In [8]: Ml.data
Out[8]: 
array([list([0.4846731927245771, 0.5702631830799726]),
       list([0.6146583299803253]),
       list([0.7090013427361257, 0.5276358934648013]),
       list([0.8090255179222732, 0.6991957521702542]),
       list([0.5176471479060225]),
       list([0.19185739496268694, 0.3564088703236009, 0.2264459826535451]),
       list([0.8158366018134895, 0.5267621547920701, 0.7751363190143352]),
       list([0.5854011718424482, 0.7282733590496102, 0.49527122544858804, 0.12163402858941941]),
       list([0.8940803471238159]), list([0.9746504287533381])], dtype=object)

根据您的规范使用循环和列表理解格式化行:

In [9]: for r,d in zip(Ml.rows, Ml.data):
   ...:     print(' '.join(['%s:%s'%(r1,d1) for r1,d1 in zip(r,d)]))
   ...:     
5:0.4846731927245771 8:0.5702631830799726
9:0.6146583299803253
7:0.7090013427361257 8:0.5276358934648013
2:0.8090255179222732 6:0.6991957521702542
1:0.5176471479060225
1:0.19185739496268694 6:0.3564088703236009 8:0.2264459826535451
6:0.8158366018134895 7:0.5267621547920701 9:0.7751363190143352
3:0.5854011718424482 5:0.7282733590496102 8:0.49527122544858804 9:0.12163402858941941
8:0.8940803471238159
8:0.9746504287533381

替换您的文件写行以进行打印。

我们正在“手动”循环,但访问数据元素的时间相对较快。当然比索引M[i,j]更快,这对于coo格式是不可能的。

也可以通过csr格式属性进行快速行访问,但需要更多地了解数据的存储方式。

您的:语法并不常见,因此无论如何都要进行格式化。打算如何阅读此文件?