爱好者程序员在这里。我使用Scipy的spatial.distance.cosine函数计算大型pandas DataFrame(~15k列,~100k条目)的列之间的余弦相似度。但是,计算需要很长时间,我想知道是否有办法降低精度以使它们更快?我很乐意将输出的dtype从float64更改为float16。
我的代码如下所示:
cosine_similarity = pd.DataFrame()
for index1, col1 in df.iteritems():
row = {}
u = col1.as_matrix()
for index2, col2 in df.iteritems():
v = col2.as_matrix()
similarity = 1 - scipy.spatial.distance.cosine(u,v)
# i guess i could cast this to float16 afterwards
row[index2] = similarity
row_series = pd.Series(row, index=[index1])
cosine_similarity = pd.concat([cosine_similarity, row_series])
print(index1) #for display reasons
理想情况下,我想做这样的事情:
similarity = 1 - scipy.spatial.distance.cosine(u,v, dtype=float16)
我应采取什么方法?