我有大约13GB的阵列。我称其为numpy.var
以计算方差。但是,它会另外分配〜13GB来执行此操作。为什么需要O(N)空间?还是我打错了电话numpy.var
?
import numpy as np
# data = ...
print('Variance: ', np.var(data))
答案 0 :(得分:1)
这在没有并行化的情况下更快:
import numpy as np
def var(a: np.ndarray, axis: int = 0):
return np.sum(abs(a - (a.sum(axis=axis) / len(a))) ** 2, axis=axis) / len(a)
答案 1 :(得分:0)
NumPy将创建一个中间数组来计算abs(data - data.mean()) ** 2
,以计算方差。您可以使用循环编写自己的方差函数,并使用Numba使其快速:
import numpy as np
import numba as nb
@nb.njit(parallel=True)
def var_nb(a, ddof=0):
n = len(a)
s = a.sum()
m = s / (n - ddof)
v = 0
for i in nb.prange(n):
v += abs(a[i] - m) ** 2
return v / (n - ddof)
np.random.seed(100)
a = np.random.rand(100_000)
print(np.var(a))
# 0.08349747560941487
print(var_nb(a))
# 0.08349747560941487
%timeit np.var(a)
# 143 µs ± 414 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit var_nb(a)
# 40.2 µs ± 530 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)