我有一个矩阵X
,为此我正在计算中间矩阵乘积的加权和。这是一个最小的可重现示例:
import numpy as np
random_state = np.random.RandomState(1)
n = 5
p = 10
X = random_state.rand(p, n) # 10x5
X_sum = np.zeros((n, n)) # 5x5
# The length of weights are not related to X's dims,
# but will always be smaller
y = 3
weights = random_state.rand(y)
for k in range(y):
X_sum += np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k]
这可以正常工作并产生我期望的结果。但是,随着n
和y
的大小(成百上千)的增加,这变得非常昂贵,因为重复计算矩阵乘积并不完全有效...
但是,如何计算乘积有一个明显的模式:
您可以看到,随着迭代的进行,Xt
中的起始列切片向右移动,而X
中的结束列向上移动。这是第N次迭代的样子:
这实际上意味着相同值的子集将被重复相乘(请参阅编辑2 ),在我看来这可能是一个利用……的机会(例如,如果我要手动计算乘积)。
但是我希望不必手动执行任何操作,并且可能有一种很好的方法可以通过Numpy更好地实现整个循环。
一组现实的数字:
n = 400
p = 2000
y = 750
要发表评论:
您能解释一下哪些值会重复相乘吗?
考虑以下数组:
n = p = 5
X = np.arange(25).reshape(p, n)
对于k=0
,第一个乘积将在A
和B
之间:
k = 0
A = X.T[:, k + 1:]
B = X[:p - (k + 1), :]
>>> A
array([[ 5, 10, 15, 20],
[ 6, 11, 16, 21],
[ 7, 12, 17, 22],
[ 8, 13, 18, 23],
[ 9, 14, 19, 24]])
>>> B
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
当k=1
时:
k = 1
>>> A
array([[10, 15, 20],
[11, 16, 21],
[12, 17, 22],
[13, 18, 23],
[14, 19, 24]])
>>> B
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
因此,每个后继矩阵乘积都是前一个乘积的子集,如果有意义的话。
答案 0 :(得分:4)
TLDR;我将基于对test_gen_sum
,n
和p
的各种值进行基准测试,选择@Parfait使用y
。为保持连续性,在这里保留旧答案。
n
,p
,y
如何影响算法的选择此分析是使用@Parfait函数完成的,它是确定是否真的存在一个最佳解决方案或基于n
,{{ 1}}和p
。
y
import numpy as np
import pytest # This code also requires the pytest-benchmark plugin
def test_for_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)
for k in range(y):
X_sum += np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k]
return X_sum
def test_list_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)
matrix_list = [np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]
X_sum = np.sum(matrix_list, axis=0)
return X_sum
def test_reduce_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)
matrix_list = [(X.T[:, k + 1:] @
X[:p - (k + 1), :]) * weights[k] for k in range(y)]
X_sum = reduce(lambda x,y: x + y, matrix_list)
return X_sum
def test_concat_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)
x_mat = np.concatenate([np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) for k in range(y)])
wgt_mat = np.concatenate([np.full((n,1), weights[k]) for k in range(y)])
mul_res = x_mat * wgt_mat
X_sum = mul_res.reshape(-1, n, n).sum(axis=0)
return X_sum
def test_matmul_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)
# Use list comprehension and np.matmul
matrices_list = [np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]
# Sum matrices in list of matrices to get the final result
X_sum = np.sum(matrices_list, axis=0)
return X_sum
def test_gen_sum(n, p, y):
random_state = np.random.RandomState(1)
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
weights = random_state.rand(y)
matrix_gen = (np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y))
X_sum = sum(matrix_gen)
return X_sum
parameters = [
pytest.param(400, 800, 3)
,pytest.param(400, 2000, 3)
,pytest.param(400, 800, 750)
,pytest.param(400, 2000, 750)
]
@pytest.mark.parametrize('n,p,y', parameters)
def test_test_for_sum(benchmark, n, p, y):
benchmark(test_for_sum, n=n, p=p, y=y)
@pytest.mark.parametrize('n,p,y', parameters)
def test_test_list_sum(benchmark, n, p, y):
benchmark(test_list_sum, n=n, p=p, y=y)
@pytest.mark.parametrize('n,p,y', parameters)
def test_test_reduce_sum(benchmark, n, p, y):
benchmark(test_reduce_sum, n=n, p=p, y=y)
@pytest.mark.parametrize('n,p,y', parameters)
def test_test_concat_sum(benchmark, n, p, y):
benchmark(test_concat_sum, n=n, p=p, y=y)
@pytest.mark.parametrize('n,p,y', parameters)
def test_test_matmul_sum(benchmark, n, p, y):
benchmark(test_matmul_sum, n=n, p=p, y=y)
@pytest.mark.parametrize('n,p,y', parameters)
def test_test_gen_sum(benchmark, n, p, y):
benchmark(test_gen_sum, n=n, p=p, y=y)
,n=400
,p=800
(100次迭代)
test_gen_sum
,n=400
,p=2000
(100次迭代)
test_gen_sum
,n=400
,p=800
(10次迭代)
test_gen_sum
,n=400
,p=2000
(10次迭代)
test_gen_sum
值我肯定会使用np.matmul
而不是y
,这将为您带来最大的性能提升,实际上,np.dot
的文档将指导您使用np.dot
进行2D数组乘法代替np.matmul
。
我测试了np.dot
和np.dot
,无论是否具有列表理解功能,pytest-benchmark的结果都在这里:
顺便说一句 pytest-benchmark 非常漂亮,我强烈建议在这种情况下使用它来验证一种方法是否真正有效。
在事物方案中,仅使用列表理解对np.matmul
的结果几乎没有影响,而对np.matmul
的负面影响(虽然是更好的形式),但是两种变化的组合产生了最好的结果。我警告说,使用列表推导会推动性病的发展。开发。运行时的性能,因此与仅使用np.dot
相比,您可能会看到运行时性能更大的范围。
代码如下:
np.matmul
import numpy as np
def test_np_matmul_list_comprehension():
random_state = np.random.RandomState(1)
n = p = 1000
X = np.arange(n * n).reshape(p, n)
# The length of weights are not related to X's dims,
# but will always be smaller
y = 3
weights = [1, 1, 1]
# Use list comprehension and np.matmul
matrices_list = [np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]
# Sum matrices in list of matrices to get the final result
X_sum = np.sum(matrices_list, axis=0)
个值更大对于更大的y
值,最好不要使用列表推导。在这两种情况下,y
和np.dot
的平均/中值运行时间都倾向于更大。以下是{{1},np.matmul
,pytest-benchmark
)的n=500
结果:
这可能是矫kill过正,但我宁可过于帮忙:)。
答案 1 :(得分:3)
与for
循环中的迭代求和调用相比,请考虑以下重构版本。使用reduce
,generator和np.concatenate
的新版本虽然速度稍快,但仍与for
循环相当。每个都以n = 400, p = 800, y = 750
运行。
OP原始版本
import numpy as np
def test_for_sum():
random_state = np.random.RandomState(1)
n= 400
p = 800
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
y = 750
weights = random_state.rand(y)
for k in range(y):
X_sum += np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k]
return X_sum
具有np.dot的列表理解
def test_list_sum():
random_state = np.random.RandomState(1)
n= 400
p = 800
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
y = 750
weights = random_state.rand(y)
matrix_list = [np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]
X_sum = sum(matrix_list)
return X_sum
发电机版本
def test_gen_sum():
random_state = np.random.RandomState(1)
n= 400
p = 800
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
y = 750
weights = random_state.rand(y)
matrix_gen = (np.dot(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y))
X_sum = sum(matrix_gen)
return X_sum
简化版本 (使用新的@
运算符-语法糖-代替np.matmul
)
from functools import reduce
def test_reduce_sum():
random_state = np.random.RandomState(1)
n= 400
p = 800
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
y = 750
weights = random_state.rand(y)
matrix_list = [(X.T[:, k + 1:] @
X[:p - (k + 1), :]) * weights[k] for k in range(y)]
X_sum = reduce(lambda x,y: x + y, matrix_list)
return X_sum
连接版本
def test_concat_sum():
random_state = np.random.RandomState(1)
n= 400
p = 800
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
y = 750
weights = random_state.rand(y)
x_mat = np.concatenate([np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) for k in range(y)])
wgt_mat = np.concatenate([np.full((n,1), weights[k]) for k in range(y)])
mul_res = x_mat * wgt_mat
X_sum = mul_res.reshape(-1, n, n).sum(axis=0)
return X_sum
使用np.matmul进行列表理解
def test_matmul_sum():
random_state = np.random.RandomState(1)
n = 400
p = 800
X = random_state.rand(p, n)
X_sum = np.zeros((n, n))
# The length of weights are not related to X's dims,
# but will always be smaller
y = 750
weights = random_state.rand(y)
# Use list comprehension and np.matmul
matrices_list = [np.matmul(X.T[:, k + 1:],
X[:p - (k + 1), :]) * weights[k] for k in range(y)]
# Sum matrices in list of matrices to get the final result
X_sum = np.sum(matrices_list, axis=0)
return X_sum
import time
start_time = time.time()
res_for = test_for_sum()
print("SUM: {} seconds ---".format(time.time() - start_time))
start_time = time.time()
res_list = test_list_sum()
print("LIST: {} seconds ---".format(time.time() - start_time))
start_time = time.time()
res_gen = test_gen_sum()
print("GEN: {} seconds ---".format(time.time() - start_time))
start_time = time.time()
res_reduce= test_reduce_sum()
print("REDUCE: {} seconds ---".format(time.time() - start_time))
start_time = time.time()
res_concat = test_concat_sum()
print("CONCAT: {} seconds ---".format(time.time() - start_time))
start_time = time.time()
res_matmul = test_matmul_sum()
print("MATMUL: {} seconds ---".format(time.time() - start_time))
平等测试
print(np.array_equal(res_for, res_list))
# True
print(np.array_equal(res_for, res_gen))
# True
print(np.array_equal(res_for, res_reduce))
# True
print(np.array_equal(res_for, res_concat))
# True
print(np.array_equal(res_for, res_matmul))
# True
首次运行
# SUM: 21.569773197174072 seconds ---
# LIST: 23.576102018356323 seconds ---
# GEN: 21.385253429412842 seconds ---
# REDUCE: 21.426464080810547 seconds ---
# CONCAT: 21.059731483459473 seconds ---
# MATMUL: 23.57494807243347 seconds ---
第二次运行
# SUM: 21.6339168548584 seconds ---
# LIST: 19.767740488052368 seconds ---
# GEN: 23.86947798728943 seconds ---
# REDUCE: 19.880712032318115 seconds ---
# CONCAT: 20.761067152023315 seconds ---
# MATMUL: 23.55513620376587 seconds ---
第三次运行
# SUM: 22.764745473861694 seconds ---
# LIST: 19.953850984573364 seconds ---
# GEN: 24.37714171409607 seconds ---
# REDUCE: 22.54508638381958 seconds ---
# CONCAT: 21.20585823059082 seconds ---
# MATMUL: 22.303589820861816 seconds ---