Question

我有一个稀疏矩阵z，它是scipy.sparse.csr_matrix，形状为(n,m)，其中n<<m。我也有标签l，它只是大小为np.array的字符串n。

我想做的是用“原始”数据制作一个csv文件。即z[0]中的所有非零vlaues都将进入具有标题值l[0]的csv文件的列中，但每列将具有不同数量的值。不幸的是，numpy不能很好地处理参差不齐的数组，而且我不确定构造它的一种优雅方法。

现在我正在做

np.savetxt(pth, z.todense().T, delimiter = ",")

并手动添加列标题，这是我的下一个处理步骤可以处理所有零，但是这样做很慢。

示例：

z.todense()
array([[0,0,1,0,0,-1,0,3,0,-6,4],
       [-1,0,4,0,0,0,0,0,0,0,-2]])

l
array(["chan1", "chan2"])

我想要的

example.csv

chan1, chan2
1,-1
-1,4
3,-2
-6,
4,

Answer 1

In [74]: from scipy import sparse

In [75]: M = sparse.csr_matrix([[0,0,1,0,0,-1,0,3,0,-6,4],
    ...:        [-1,0,4,0,0,0,0,0,0,0,-2]])
In [76]: M
Out[76]: 
<2x11 sparse matrix of type '<class 'numpy.int64'>'
    with 8 stored elements in Compressed Sparse Row format>

In [77]: M.A
Out[77]: 
array([[ 0,  0,  1,  0,  0, -1,  0,  3,  0, -6,  4],
       [-1,  0,  4,  0,  0,  0,  0,  0,  0,  0, -2]], dtype=int64)

lil格式按行提供数据：

In [78]: Ml = M.tolil()
In [79]: Ml.data
Out[79]: array([list([1, -1, 3, -6, 4]), list([-1, 4, -2])], dtype=object)

现在，只需按照所需的方式将这些列表写入文件即可。

In [81]: from itertools import zip_longest

In [82]: for i,j in zip_longest(*Ml.data, fillvalue=''):
    ...:     astr = '%s, %s'%(i,j)
    ...:     print(astr)
    ...:     
1, -1
-1, 4
3, -2
-6, 
4,

zip_longest是一种以最长的列表为参考来遍历多个列表的简便方法。

稀疏矩阵输出到CSV

1 个答案: