访问sklearn稀疏数组

时间:2018-01-28 23:05:03

标签: python-3.x scipy scikit-learn sparse-matrix minimum-spanning-tree

我正在尝试为欧氏最小生成树编写一个函数,我遇到麻烦就是找到K最近邻居,因为你可以看到我调用返回稀疏数组的函数包含索引和距离到最近的邻居,但我不能像我想的那样访问元素:

 for p1,p2, w in A:
    do things

因为这会返回一个错误,即A只返回1个项目(而不是3个)。有没有办法访问此数据集中每个元素的元素,以形成距离为重量的边缘?我对python很新,并且仍在尝试学习该语言的所有细节。

from sklearn.neighbors import kneighbors_graph
from kruskalsalgorithm import *
import networkx as nx


def EMST(inlist):

    graph = nx.Graph()

    for a,b in inlist:
        graph.add_node((a,b))

    print("nodes = ", graph.nodes())

    A = kneighbors_graph(graph.nodes(),1,mode='distance', metric='euclidean',include_self=False,n_jobs=-1)
    A.toarray()

这是我测试我的功能的方式

mylist = [[2,3],[4,2],[9,4],[3,1]]
EMST(mylist)

我的输出是:

nodes = [(2, 3), (4, 2), (9, 4), (3, 1)]
(0, 1)    2.2360679775
(1, 3)    1.41421356237
(2, 1)    5.38516480713
(3, 1)    1.41421356237

2 个答案:

答案 0 :(得分:1)

你没有真正解释你想要做什么。有许多可以想象的潜在事物。

但总的来说,您应该遵循文档@ scipy.sparse。在您的情况下,sklearn的功能保证csr_format

一种可能的用法是:

from scipy import sparse as sp
import numpy as np
np.random.seed(1)

mat = sp.random(4,4, density=0.4)
print(mat)

I, J, V = sp.find(mat)
print(I)
print(J)
print(V)

输出:

(3, 0)        0.846310916686
(1, 3)        0.313273516932
(3, 1)        0.524548159573
(2, 0)        0.44345289378
(2, 1)        0.22957721373
(2, 2)        0.534413908947
[2 3 2 3 2 1]
[0 0 1 1 2 3]
[ 0.44345289  0.84631092  0.22957721  0.52454816  0.53441391  0.31327352]

当然可以这样做:

for a, b, w in zip(I, J, V):
    print(a, b, w)

打印:

2 0 0.44345289378
3 0 0.846310916686
2 1 0.22957721373
3 1 0.524548159573
2 2 0.534413908947
1 3 0.313273516932

答案 1 :(得分:1)

我可以使用以下方式重新创建您的显示器:

In [65]: from scipy import sparse
In [72]: row = np.array([0,1,2,3])
In [73]: col = np.array([1,3,1,1])
In [74]: data = np.array([5,2,29,2])**.5
In [75]: M = sparse.csr_matrix((data, (row, col)), shape=(4,4))
In [76]: M
Out[76]: 
<4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 4 stored elements in Compressed Sparse Row format>
In [77]: print(M)
  (0, 1)    2.23606797749979
  (1, 3)    1.4142135623730951
  (2, 1)    5.385164807134504
  (3, 1)    1.4142135623730951
In [78]: M.A   # M.toarray()
Out[78]: 
array([[0.        , 2.23606798, 0.        , 0.        ],
       [0.        , 0.        , 0.        , 1.41421356],
       [0.        , 5.38516481, 0.        , 0.        ],
       [0.        , 1.41421356, 0.        , 0.        ]])

pts=[(2, 3), (4, 2), (9, 4), (3, 1)]'. Distance from pts [0]到pts [1] is sqrt(5)`等等。

稀疏coo格式可以访问坐标和距离。 sparse.find也生成这些数组。

In [83]: Mc = M.tocoo()
In [84]: Mc.row
Out[84]: array([0, 1, 2, 3], dtype=int32)
In [85]: Mc.col
Out[85]: array([1, 3, 1, 1], dtype=int32)
In [86]: Mc.data
Out[86]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])

检查点和矩阵匹配:

In [95]: pts = np.array([(2, 3), (4, 2), (9, 4), (3, 1)])
In [96]: pts
Out[96]: 
array([[2, 3],
       [4, 2],
       [9, 4],
       [3, 1]])
In [97]: for r,c,d in zip(*sparse.find(M)):
    ...:     print(((pts[r]-pts[c])**2).sum()**.5)
    ...:     
2.23606797749979
5.385164807134504
1.4142135623730951
1.4142135623730951

或者一次获得所有最近的距离:

In [107]: np.sqrt(((pts[row,:]-pts[col,:])**2).sum(1))
Out[107]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [110]: np.linalg.norm(pts[row,:]-pts[col,:],axis=1)
Out[110]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])

蛮力&#39;最小距离计算:

所有成对距离:

In [112]: dist = np.linalg.norm(pts[None,:,:]-pts[:,None,:],axis=2)
In [113]: dist
Out[113]: 
array([[0.        , 2.23606798, 7.07106781, 2.23606798],
       [2.23606798, 0.        , 5.38516481, 1.41421356],
       [7.07106781, 5.38516481, 0.        , 6.70820393],
       [2.23606798, 1.41421356, 6.70820393, 0.        ]])

(将其与Out[78]进行比较)

&#39;空白&#39;出对角线

In [114]: D = dist + np.eye(4)*100

最小距离和坐标(按行):

In [116]: np.min(D, axis=1)
Out[116]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [117]: np.argmin(D, axis=1)
Out[117]: array([1, 3, 1, 1], dtype=int32)