熊猫数据框到coo矩阵和lil matix

时间:2020-01-08 22:44:02

标签: pandas numpy scipy sparse-matrix

我有以下系列:

groups['combined'] 

0            (28, 1)  1
1           (32, 1)  1
2           (36, 1)  1
3           (37, 1)  1
4           (84, 1)  1

....
Name: combined, Length: 14476, dtype: object

如何将该数据帧转换为.tocoo()矩阵和.tolil()

参考combined列的形成方式 原始的熊猫数据框:

import pandas as pd pd.DataFrame ({0:[28,32,36,37,84],1: [1,1,1,1,1], 2: [1,1,1,1,1]})。 col 0具有超过10K的独特功能,col 1具有39个组,col 2仅1个。

2 个答案:

答案 0 :(得分:1)

Formation of COOrdinate format from original pandas DataFrame

    import scipy.sparse as sps

    groups.set_index([0, 1], inplace=True)
    sps.coo_matrix((groups[2], (groups.index.labels[0], groups.index.labels[1])))

-------------结果为----------

<10312x39 sparse matrix of type '<class 'numpy.int64'>'
    with 14476 stored elements in COOrdinate format>

答案 1 :(得分:0)

In regards to lil matrix

print(len(networks[0]), len(networks[1]), networks[0].nunique(), networks[1].nunique())
667966 667966 10312 10312
networks[:5]

    0   1
0   176 1
1   233 1
2   283 1
3   371 1
4   394 1


# make row and col labels
rows = networks[0]
cols = networks[1]

# crucial third array in python
networks.set_index([0, 1], inplace=True)
Ntw= sps.coo_matrix((networks[2], (networks.index.labels[0], 
networks.index.labels[1])))


d=Ntw.tolil()
d

生成

   <10312x10312 sparse matrix of type '<class 'numpy.int64'>'
    with 667966 stored elements in LInked List format>