将非常大的numpy.ndarray存储到数据帧的单元格中

时间:2019-06-01 04:57:38

标签: python pandas numpy numpy-ndarray

我正在做一些操作,将numpy.ndarray作为输出。我希望将该数组存储到Pandas数据框的列中。

我目前的做法是:

df_new = pd.DataFrame(index=df['id'], columns=df['id'])
A_sparse = sparse.csr_matrix(df)

sim1 = cosine_similarity(A_sparse[0:5000],A_sparse)
print(sim1) #upto this point the code runs fine even for larger datasets

df_new[0:5000] = sim1[:, :] #this statement crashes everytime for 5000 rows

sim1 = cosine_similarity(A_sparse[5000:10000],A_sparse)
df_new[5000:10000] = sim1[:, :] 

尽管上面的代码对于小型数组绝对运行良好,但对于大型数组(5000,28785)则崩溃。

供参考的样本数据:

df.head(5)

            Qua  F  M  0  35  5  E \
id                                                                    
11391                       1  0  1     0      0     1                    1   
28400                       1  1  0     0      1     0                    0   
30268                       3  1  0     0      1     0                    0   
31604                       3  1  0     0      1     0                    0   
32885                       2  1  0     1      0     0                    0   

        Ea  Ir N HR \
id                                                               
11391                          0        0              0             0   
28400                          0        0              0             0   
30268                          0        0              0             0   
31604                          0        0              0             0   
32885                          0        0              0             0   

        No Sc  So  Sou  \
id                                                                    
11391                          0         0                  0             0   
28400                          0         1                  0             0   
30268                          1         0                  0             0   
31604                          0         0                  1             0   
32885                          0         0                  0             0   

        SA  Wa  WM  YR  
id                                                                  
11391                       0      0                     0                 0  
28400                       0      0                     0                 0  
30268                       0      0                     0                 0  
31604                       0      0                     0                 0  
32885                       0      0                     1                 0  

A_sparse

(0, 0)  1
(0, 2)  1
(0, 5)  1
(0, 6)  1
(1, 0)  1
(1, 1)  1
(1, 4)  1
(1, 12) 1
(2, 0)  3
(2, 1)  1
(2, 4)  1
(2, 11) 1
(3, 0)  3
(3, 1)  1
(3, 4)  1
(3, 13) 1

sim1

`[1.0, 0.25, 0.43301270189221935, 0.43301270189221935, 0.3779644730092272, 0.5773502691896258, 0.5, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.75, 0.3779644730092272, 0.25, 0.3779644730092272, 0.25, 0.25, 0.75, 0.5669467095138407, 0.7216878364870323, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.5, 0.5669467095138407, 0.5773502691896258, 0.43301270189221935, 0.5669467095138407, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.3779644730092272, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.5669467095138407, 0.5669467095138407, 0.43301270189221935, 0.43301270189221935, 0.7559289460184544, 0.5773502691896258, 0.5, 0.5773502691896258, 0.5773502691896258, 0.75, 0.25, 0.43301270189221935, 0.3779644730092272, 0.7216878364870323, 0.7216878364870323, 0.43301270189221935, 0.5773502691896258, 0.7216878364870323, 0.5669467095138407, 0.75, 0.3779644730092272, 0.5773502691896258, 0.5669467095138407, 0.43301270189221935, 0.5, 0.3779644730092272, 0.43301270189221935, 1.0, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.25, 0.5773502691896258, 0.5669467095138407, 0.3779644730092272, 0.5773502691896258, 0.5669467095138407, 0.5669467095138407, 0.5669467095138407, 0.5669467095138407, 0.25, 0.75, 0.25, 0.5773502691896258, 0.43301270189221935, 0.43301270189221935, 0.43301270189221935, 0.43301270189221935, 0.5, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.7216878364870323, 0.43301270189221935, 0.5669467095138407, 0.7559289460184544, 0.5773502691896258, 0.5773502691896258, 0.7559289460184544, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.3779644730092272, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.75, 0.5773502691896258, 0.3779644730092272, 0.43301270189221935, 0.5773502691896258, 0.5, 0.5669467095138407, 0.5669467095138407, 0.7216878364870323, 0.5, 0.5773502691896258, 0.5, 0.5773502691896258, 0.5, 0.5669467095138407, 0.3779644730092272, 0.5773502691896258, 0.5773502691896258, 0.3779644730092272, 0.43301270189221935, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.5669467095138407, 0.7216878364870323, 0.7559289460184544, 0.5669467095138407, 0.43301270189221935, 0.5669467095138407, 0.7559289460184544, 0.43301270189221935, 0.5, 0.43301270189221935, 0.5, 0.43301270189221935, 0.75, 0.5773502691896258, 0.5773502691896258, 0.5773502691896258, 0.5773502691896258, 0.5, 0.25, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.7216878364870323, 0.75, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.25, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.3779644730092272, 0.43301270189221935, 0.3779644730092272, 0.5669467095138407, 0.3779644730092272, 0.43301270189221935, 0.5669467095138407, 0.3779644730092272, 0.3779644730092272, 0.3779644730092272, 0.3779644730092272, 0.25, 0.3779644730092272, 0.5669467095138407, 0.43301270189221935, 0.43301270189221935, 0.7559289460184544, 0.3779644730092272, 0.43301270189221935, 0.5, 0.377964473009227]`

0 个答案:

没有答案