Question

我正在将deepwalk / node2vec应用于具有1500万个节点和数十亿条边的图形。我已经为图生成了嵌入，并将这些嵌入放入邻接矩阵中需要稀疏矩阵。我正在使用https://github.com/palash1992/GEM/tree/master/gem中的代码。

这是一个可重现的小示例，其样本图的大小要小得多。

'embs'是由float组成的数组，其形状为（399224，64），它是通过deepwalk算法生成的，然后通过采用Hadamard乘积（此代码未显示）处理为自举边缘嵌入。

#Generate approximation of embeddings
embs = scipy.sparse.random(399224, 64, density=0.01)
embs = np.asarray(embs)

这是我要转换为使用稀疏矩阵的代码：

node_num = 1000

#Create non-sparse adjacency matrix from embeddings in embs
adj_mtx_r = np.zeros((node_num, node_num))

for v_i in range(node_num):
    for v_j in range(node_num):
        if v_i == v_j:
            continue
        adj_mtx_r[v_i, v_j] = np.dot(embs[v_i, :], embs[v_j, :])

我已阅读以下内容：http://www.scipy-lectures.org/advanced/scipy_sparse/storage_schemes.html

以下是我尝试使用与上述代码块相同的算法来制作稀疏矩阵。 我的问题是，这是否会产生与上述相同的结果：

#Get the row indices of the non-zero matrix entries
row = np.nonzero(train_edge_embs)[0]

#Get the column indices of the non-zero matrix entries
col = np.nonzero(train_edge_embs)[1]

#Get the non-zero values of the matrix entries
data = train_edge_embs[np.nonzero(train_edge_embs)]

test_mtx = sp.coo_matrix((data, (row, col)), (len(data), len(data)))
estimated_adj = test_mtx.tocsr()

我最终将根据邻接矩阵创建一个边缘列表，但这在本代码中未表示。

使用scipy.sparse从嵌入（word2vec，node2vec ...）构造邻接矩阵

0 个答案: