我有以下系列:
groups['combined']
0 (28, 1) 1
1 (32, 1) 1
2 (36, 1) 1
3 (37, 1) 1
4 (84, 1) 1
....
Name: combined, Length: 14476, dtype: object
如何将该数据帧转换为.tocoo()
矩阵和.tolil()
?
参考combined
列的形成方式
原始的熊猫数据框:
import pandas as pd
pd.DataFrame
({0:[28,32,36,37,84],1: [1,1,1,1,1], 2: [1,1,1,1,1]})
。 col 0具有超过10K的独特功能,col 1
具有39个组,col 2
仅1个。
答案 0 :(得分:1)
Formation of COOrdinate format from original pandas DataFrame
import scipy.sparse as sps
groups.set_index([0, 1], inplace=True)
sps.coo_matrix((groups[2], (groups.index.labels[0], groups.index.labels[1])))
-------------结果为----------
<10312x39 sparse matrix of type '<class 'numpy.int64'>'
with 14476 stored elements in COOrdinate format>
答案 1 :(得分:0)
In regards to lil matrix
print(len(networks[0]), len(networks[1]), networks[0].nunique(), networks[1].nunique())
667966 667966 10312 10312
networks[:5]
0 1
0 176 1
1 233 1
2 283 1
3 371 1
4 394 1
# make row and col labels
rows = networks[0]
cols = networks[1]
# crucial third array in python
networks.set_index([0, 1], inplace=True)
Ntw= sps.coo_matrix((networks[2], (networks.index.labels[0],
networks.index.labels[1])))
d=Ntw.tolil()
d
生成
<10312x10312 sparse matrix of type '<class 'numpy.int64'>'
with 667966 stored elements in LInked List format>