从具有边的数据框创建稀疏矩阵

时间:2018-05-05 18:24:35

标签: python pandas numpy

假设我有一个csv文件包含以下格式的数据:

A B

C D

A C

D F

G H

K M

M A

其中每一行在node1和node2之间给出一个无向边。我目前正在将其作为数据框读取,但希望将其转换为稀疏矩阵。有没有循环的快速简便的方法呢?

1 个答案:

答案 0 :(得分:0)

要直接构造一个scipy稀疏矩阵,您必须将字母映射到唯一索引,例如A == 1B == 2

In [202]: txt='''A B
     ...: 
     ...: C D
     ...: 
     ...: A C
     ...: 
     ...: D F
     ...: 
     ...: G H
     ...: 
     ...: K M
     ...: 
     ...: M A'''.splitlines()
In [203]: values = 'ABCDEFGHIJKLM'
In [204]: data = [x.split() for x in txt if x]
In [205]: data = [[values.index(x) for x in row] for row in data]
In [206]: data
Out[206]: [[0, 1], [2, 3], [0, 2], [3, 5], [6, 7], [10, 12], [12, 0]]

所以现在我们有坐标对。从这些构造稀疏矩阵的方法有很多种。从概念上讲,最简单的方法是使用lil格式矩阵(迭代构造的最佳格式)迭代地进行迭代:

In [207]: from scipy import sparse
In [208]: M = sparse.lil_matrix((len(values),len(values)),dtype=int)
In [209]: for row in data:
     ...:     M[tuple(row)] = 1
     ...:     
In [210]: M
Out[210]: 
<13x13 sparse matrix of type '<class 'numpy.int64'>'
    with 7 stored elements in LInked List format>
In [211]: M.A
Out[211]: 
array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])