Question

我正在研究一种使用Dask避免内存错误的算法，因为数据似乎太大而无法在我的计算机上处理。在算法的其中一个步骤中，我要使用形状为coefs的系数（和稀疏）矩阵(M, N)并在形状为{的数组T上执行逐元素乘法{1}}，由(K, M, N) K个矩阵组成。由于MxN是一个稀疏矩阵，因此我想将其考虑在内，而不是使用密集数组。

我正在使用以下配置：

Python版本：3.7.3
数字版：1.16.2
Dask版本：1.2.0
Scipy版本：1.2.1
稀疏版本：0.7.0

我尝试了三种不同的可能性。

版本1

系数矩阵只是一个coef矩阵。

版本2

系数矩阵是numpy.ndarray矩阵。

版本3

系数矩阵是scipy.sparse.csr.csr_matrix矩阵。

参考：https://sparse.pydata.org/en/latest/

sparse.coo.core.COO

如果传递给func_test的参数为import numpy as np import dask.array as da import scipy.sparse as sp import sparse def func_test(coef): K, M, N, tol = 10, 2000, 800, 1e-7 # Starting values P1 = np.random.rand(K,M) P1 /= P1.sum(1)[:, None] P2 = np.random.rand(K, N) P2 /= P2.sum(1)[:, None] P3 = np.random.rand(K) P3 /= P3.sum() # Convert arrays to dask P2 = da.from_array(P2, chunks=(1000)) P3 = da.from_array(P3, chunks=(1000)) # Threshold P1[P1 < tol] = tol P2[P2 < tol] = tol for iter_number in range(20): T = P3[:, None, None] * (P1[:, :, None] @ P2[:, None, :]) T *= coef # Problematic line P1 = T.sum(2) / T.sum((1,2))[:,None] # (K, M) P2 = T.sum(1) / T.sum((1,2))[:,None] P3 = T.sum((1,2)) / T.sum() # Threshold P1[P1 < tol] = tol P2[P2 < tol] = tol return T, P1, P2, P3 if __name__ == "__main__": # coef is a numpy.ndarray M, N = 2000, 800 coef_1 = np.random.random((M, N)) # Make it sparse coef_1[coef_1 < 0.92] = 0 # coef is a scipy.sparse matrix coef_2 = sp.csr_matrix(coef_1) # coef is a sparse COO matrix coef_3 = sparse.COO.from_numpy(coef_1) T, P1, P2, P3 = func_test(coef_1) T = T.compute()或coef_2，则代码将引发错误。

如果参数为coef_3，Python会引发ValueError：

coef_2

如果为ValueError: could not interpret dimensions，则会引发另一个ValueError：

coef_3

我在IPython上使用ValueError: Please make sure that the broadcast shape of just the sparse arrays is the same as the broadcast shape of all the operands.来衡量执行时间：

使用%timeit：coef_1
使用869 ms ± 52.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)：coef_2
使用322 ms ± 3.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)：coef_3

我的问题是：是否可以使用带有稀疏矩阵的Dask N维数组执行操作？就地分配

249 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

似乎可以节省很多时间。

一些笔记

我尝试使用

T *= coef

但是会增加执行时间：

T = coef.multiply(coef_2)

4.65 s ± 173 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)和M的值将大于代码中显示的值，因此要求较轻。
我正在使用Dask，但欢迎提出建议。特别是操作：

当矩阵太大时，会在我的计算机上引起T = P3[:, None, None] * (P1[:, :, None] @ P2[:, None, :])。

达阵阵列广播似乎与稀疏矩阵不兼容

版本1

版本2

版本3

一些笔记

0 个答案: