我正在做一些操作,将numpy.ndarray
作为输出。我希望将该数组存储到Pandas数据框的列中。
我目前的做法是:
df_new = pd.DataFrame(index=df['id'], columns=df['id'])
A_sparse = sparse.csr_matrix(df)
sim1 = cosine_similarity(A_sparse[0:5000],A_sparse)
print(sim1) #upto this point the code runs fine even for larger datasets
df_new[0:5000] = sim1[:, :] #this statement crashes everytime for 5000 rows
sim1 = cosine_similarity(A_sparse[5000:10000],A_sparse)
df_new[5000:10000] = sim1[:, :]
尽管上面的代码对于小型数组绝对运行良好,但对于大型数组(5000,28785)
则崩溃。
供参考的样本数据:
df.head(5)
:
Qua F M 0 35 5 E \
id
11391 1 0 1 0 0 1 1
28400 1 1 0 0 1 0 0
30268 3 1 0 0 1 0 0
31604 3 1 0 0 1 0 0
32885 2 1 0 1 0 0 0
Ea Ir N HR \
id
11391 0 0 0 0
28400 0 0 0 0
30268 0 0 0 0
31604 0 0 0 0
32885 0 0 0 0
No Sc So Sou \
id
11391 0 0 0 0
28400 0 1 0 0
30268 1 0 0 0
31604 0 0 1 0
32885 0 0 0 0
SA Wa WM YR
id
11391 0 0 0 0
28400 0 0 0 0
30268 0 0 0 0
31604 0 0 0 0
32885 0 0 1 0
A_sparse
:
(0, 0) 1
(0, 2) 1
(0, 5) 1
(0, 6) 1
(1, 0) 1
(1, 1) 1
(1, 4) 1
(1, 12) 1
(2, 0) 3
(2, 1) 1
(2, 4) 1
(2, 11) 1
(3, 0) 3
(3, 1) 1
(3, 4) 1
(3, 13) 1
sim1
:
`[1.0, 0.25, 0.43301270189221935, 0.43301270189221935, 0.3779644730092272, 0.5773502691896258, 0.5, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.75, 0.3779644730092272, 0.25, 0.3779644730092272, 0.25, 0.25, 0.75, 0.5669467095138407, 0.7216878364870323, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.5, 0.5669467095138407, 0.5773502691896258, 0.43301270189221935, 0.5669467095138407, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.3779644730092272, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.5669467095138407, 0.5669467095138407, 0.43301270189221935, 0.43301270189221935, 0.7559289460184544, 0.5773502691896258, 0.5, 0.5773502691896258, 0.5773502691896258, 0.75, 0.25, 0.43301270189221935, 0.3779644730092272, 0.7216878364870323, 0.7216878364870323, 0.43301270189221935, 0.5773502691896258, 0.7216878364870323, 0.5669467095138407, 0.75, 0.3779644730092272, 0.5773502691896258, 0.5669467095138407, 0.43301270189221935, 0.5, 0.3779644730092272, 0.43301270189221935, 1.0, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.25, 0.5773502691896258, 0.5669467095138407, 0.3779644730092272, 0.5773502691896258, 0.5669467095138407, 0.5669467095138407, 0.5669467095138407, 0.5669467095138407, 0.25, 0.75, 0.25, 0.5773502691896258, 0.43301270189221935, 0.43301270189221935, 0.43301270189221935, 0.43301270189221935, 0.5, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.7216878364870323, 0.43301270189221935, 0.5669467095138407, 0.7559289460184544, 0.5773502691896258, 0.5773502691896258, 0.7559289460184544, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.3779644730092272, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.75, 0.5773502691896258, 0.3779644730092272, 0.43301270189221935, 0.5773502691896258, 0.5, 0.5669467095138407, 0.5669467095138407, 0.7216878364870323, 0.5, 0.5773502691896258, 0.5, 0.5773502691896258, 0.5, 0.5669467095138407, 0.3779644730092272, 0.5773502691896258, 0.5773502691896258, 0.3779644730092272, 0.43301270189221935, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.5669467095138407, 0.7216878364870323, 0.7559289460184544, 0.5669467095138407, 0.43301270189221935, 0.5669467095138407, 0.7559289460184544, 0.43301270189221935, 0.5, 0.43301270189221935, 0.5, 0.43301270189221935, 0.75, 0.5773502691896258, 0.5773502691896258, 0.5773502691896258, 0.5773502691896258, 0.5, 0.25, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.7216878364870323, 0.75, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.5773502691896258, 0.5773502691896258, 0.43301270189221935, 0.25, 0.43301270189221935, 0.5773502691896258, 0.43301270189221935, 0.43301270189221935, 0.5773502691896258, 0.3779644730092272, 0.43301270189221935, 0.3779644730092272, 0.5669467095138407, 0.3779644730092272, 0.43301270189221935, 0.5669467095138407, 0.3779644730092272, 0.3779644730092272, 0.3779644730092272, 0.3779644730092272, 0.25, 0.3779644730092272, 0.5669467095138407, 0.43301270189221935, 0.43301270189221935, 0.7559289460184544, 0.3779644730092272, 0.43301270189221935, 0.5, 0.377964473009227]`