Question

我有一个很大的数据集，需要为此计算欧几里得距离矩阵，但是，我在RAM上受到限制，并且使用np.float64（默认值）作为dtype会使PC内存不足。我会使用平方距离，因为它更快，并且无论如何都会返回整数。

使用.astype（np.int32）不能解决问题，因为它仍然首先创建为float64。

数据集本身为int32，但返回的矩阵为float64

matrix = pairwise_distances(dataset, metric='euclidean', squared=True)
print(matrix.dtype)

float64

如何将其直接转换为int数组？

Answer 1

所以问题是您想提高数据RAM的效率

您可以将返回矩阵转换为int32（考虑数据中的损失）或float32，效果更好这是我的示例代码

import numpy as np
arr = np.array([[1.0, 4, 5, 12], 
    [-5.9, 8, 9.7, 0],
    [-6, 7, 11, 19]])
print(arr.dtype) 
arr = arr.astype('float32') 
print(arr.dtype)

输出如下

float64
float32

将大float64矩阵转换为int32矩阵

1 个答案: