用Numpy有效地求和复杂矩阵乘积

时间:2018-12-30 21:43:19

标签: python numpy

我有一个矩阵X,为此我正在计算中间矩阵乘积的加权和。这是一个最小的可重现示例:

import numpy as np

random_state = np.random.RandomState(1)
n = 5
p = 10

X = random_state.rand(p, n) # 10x5
X_sum = np.zeros((n, n)) # 5x5

# The length of weights are not related to X's dims,
# but will always be smaller
y = 3
weights = random_state.rand(y)

for k in range(y):
    X_sum += np.dot(X.T[:, k + 1:],
                    X[:p - (k + 1), :]) * weights[k]

这可以正常工作并产生我期望的结果。但是,随着ny的大小(成百上千)的增加,这变得非常昂贵,因为重复计算矩阵乘积并不完全有效...

但是,如何计算乘积有一个明显的模式:

First iteration

Second iteration

您可以看到,随着迭代的进行,Xt中的起始列切片向右移动,而X中的结束列向上移动。这是第N次迭代的样子:

Nth iteration

这实际上意味着相同值的子集将被重复相乘(请参阅编辑2 ),在我看来这可能是一个利用……的机会(例如,如果我要手动计算乘积)。

但是我希望不必手动执行任何操作,并且可能有一种很好的方法可以通过Numpy更好地实现整个循环。

编辑1

一组现实的数字:

n = 400
p = 2000
y = 750

编辑2

要发表评论:

  

您能解释一下哪些值会重复相乘吗?

考虑以下数组:

n = p = 5
X = np.arange(25).reshape(p, n)

对于k=0,第一个乘积将在AB之间:

k = 0
A = X.T[:, k + 1:]
B = X[:p - (k + 1), :]
>>> A
array([[ 5, 10, 15, 20],
       [ 6, 11, 16, 21],
       [ 7, 12, 17, 22],
       [ 8, 13, 18, 23],
       [ 9, 14, 19, 24]])
>>> B
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

k=1时:

k = 1
>>> A
array([[10, 15, 20],
       [11, 16, 21],
       [12, 17, 22],
       [13, 18, 23],
       [14, 19, 24]])
>>> B
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

因此,每个后继矩阵乘积都是前一个乘积的子集,如果有意义的话。

2 个答案:

答案 0 :(得分:4)

TLDR;我将基于对test_gen_sumnp的各种值进行基准测试,选择@Parfait使用y。为保持连续性,在这里保留旧答案

评估npy如何影响算法的选择

此分析是使用@Parfait函数完成的,它是确定是否真的存在一个最佳解决方案或基于n,{{ 1}}和p

y
  • import numpy as np import pytest # This code also requires the pytest-benchmark plugin def test_for_sum(n, p, y): random_state = np.random.RandomState(1) X = random_state.rand(p, n) X_sum = np.zeros((n, n)) # The length of weights are not related to X's dims, # but will always be smaller weights = random_state.rand(y) for k in range(y): X_sum += np.dot(X.T[:, k + 1:], X[:p - (k + 1), :]) * weights[k] return X_sum def test_list_sum(n, p, y): random_state = np.random.RandomState(1) X = random_state.rand(p, n) X_sum = np.zeros((n, n)) # The length of weights are not related to X's dims, # but will always be smaller weights = random_state.rand(y) matrix_list = [np.dot(X.T[:, k + 1:], X[:p - (k + 1), :]) * weights[k] for k in range(y)] X_sum = np.sum(matrix_list, axis=0) return X_sum def test_reduce_sum(n, p, y): random_state = np.random.RandomState(1) X = random_state.rand(p, n) X_sum = np.zeros((n, n)) # The length of weights are not related to X's dims, # but will always be smaller weights = random_state.rand(y) matrix_list = [(X.T[:, k + 1:] @ X[:p - (k + 1), :]) * weights[k] for k in range(y)] X_sum = reduce(lambda x,y: x + y, matrix_list) return X_sum def test_concat_sum(n, p, y): random_state = np.random.RandomState(1) X = random_state.rand(p, n) X_sum = np.zeros((n, n)) # The length of weights are not related to X's dims, # but will always be smaller weights = random_state.rand(y) x_mat = np.concatenate([np.matmul(X.T[:, k + 1:], X[:p - (k + 1), :]) for k in range(y)]) wgt_mat = np.concatenate([np.full((n,1), weights[k]) for k in range(y)]) mul_res = x_mat * wgt_mat X_sum = mul_res.reshape(-1, n, n).sum(axis=0) return X_sum def test_matmul_sum(n, p, y): random_state = np.random.RandomState(1) X = random_state.rand(p, n) X_sum = np.zeros((n, n)) # The length of weights are not related to X's dims, # but will always be smaller weights = random_state.rand(y) # Use list comprehension and np.matmul matrices_list = [np.matmul(X.T[:, k + 1:], X[:p - (k + 1), :]) * weights[k] for k in range(y)] # Sum matrices in list of matrices to get the final result X_sum = np.sum(matrices_list, axis=0) return X_sum def test_gen_sum(n, p, y): random_state = np.random.RandomState(1) X = random_state.rand(p, n) X_sum = np.zeros((n, n)) # The length of weights are not related to X's dims, # but will always be smaller weights = random_state.rand(y) matrix_gen = (np.dot(X.T[:, k + 1:], X[:p - (k + 1), :]) * weights[k] for k in range(y)) X_sum = sum(matrix_gen) return X_sum parameters = [ pytest.param(400, 800, 3) ,pytest.param(400, 2000, 3) ,pytest.param(400, 800, 750) ,pytest.param(400, 2000, 750) ] @pytest.mark.parametrize('n,p,y', parameters) def test_test_for_sum(benchmark, n, p, y): benchmark(test_for_sum, n=n, p=p, y=y) @pytest.mark.parametrize('n,p,y', parameters) def test_test_list_sum(benchmark, n, p, y): benchmark(test_list_sum, n=n, p=p, y=y) @pytest.mark.parametrize('n,p,y', parameters) def test_test_reduce_sum(benchmark, n, p, y): benchmark(test_reduce_sum, n=n, p=p, y=y) @pytest.mark.parametrize('n,p,y', parameters) def test_test_concat_sum(benchmark, n, p, y): benchmark(test_concat_sum, n=n, p=p, y=y) @pytest.mark.parametrize('n,p,y', parameters) def test_test_matmul_sum(benchmark, n, p, y): benchmark(test_matmul_sum, n=n, p=p, y=y) @pytest.mark.parametrize('n,p,y', parameters) def test_test_gen_sum(benchmark, n, p, y): benchmark(test_gen_sum, n=n, p=p, y=y) n=400p=800(100次迭代)

    • 优胜者: y=3 enter image description here
  • test_gen_sumn=400p=2000(100次迭代)

    • 优胜者: y=3 enter image description here
  • test_gen_sumn=400p=800(10次迭代)

    • 优胜者: y=750 enter image description here
  • test_gen_sumn=400p=2000(10次迭代)

    • 优胜者: y=750 enter image description here

旧答案

更小的test_gen_sum

我肯定会使用np.matmul而不是y,这将为您带来最大的性能提升,实际上,np.dot的文档将指导您使用np.dot进行2D数组乘法代替np.matmul

我测试了np.dotnp.dot,无论是否具有列表理解功能,pytest-benchmark的结果都在这里:

y=3

顺便说一句 pytest-benchmark 非常漂亮,我强烈建议在这种情况下使用它来验证一种方法是否真正有效。

在事物方案中,仅使用列表理解对np.matmul的结果几乎没有影响,而对np.matmul的负面影响(虽然是更好的形式),但是两种变化的组合产生了最好的结果。我警告说,使用列表推导会推动性病的发展。开发。运行时的性能,因此与仅使用np.dot相比,您可能会看到运行时性能更大的范围。

代码如下:

np.matmul

import numpy as np def test_np_matmul_list_comprehension(): random_state = np.random.RandomState(1) n = p = 1000 X = np.arange(n * n).reshape(p, n) # The length of weights are not related to X's dims, # but will always be smaller y = 3 weights = [1, 1, 1] # Use list comprehension and np.matmul matrices_list = [np.matmul(X.T[:, k + 1:], X[:p - (k + 1), :]) * weights[k] for k in range(y)] # Sum matrices in list of matrices to get the final result X_sum = np.sum(matrices_list, axis=0) 个值更大

对于更大的y值,最好不要使用列表推导。在这两种情况下,ynp.dot的平均/中值运行时间都倾向于更大。以下是{{1},np.matmulpytest-benchmark)的n=500结果:

enter image description here

这可能是矫kill过正,但我​​宁可过于帮忙:)。

答案 1 :(得分:3)

for循环中的迭代求和调用相比,请考虑以下重构版本。使用reduce,generator和np.concatenate的新版本虽然速度稍快,但仍与for循环相当。每个都以n = 400, p = 800, y = 750运行。

OP原始版本

import numpy as np

def test_for_sum():
    random_state = np.random.RandomState(1)
    n= 400
    p = 800

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    y = 750
    weights = random_state.rand(y)

    for k in range(y):
        X_sum += np.dot(X.T[:, k + 1:],
                        X[:p - (k + 1), :]) * weights[k]

    return X_sum

具有np.dot的列表理解

def test_list_sum():
    random_state = np.random.RandomState(1)
    n= 400
    p = 800

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    y = 750
    weights = random_state.rand(y)

    matrix_list = [np.dot(X.T[:, k + 1:],
                          X[:p - (k + 1), :]) * weights[k] for k in range(y)]

    X_sum = sum(matrix_list)

    return X_sum

发电机版本

def test_gen_sum():
    random_state = np.random.RandomState(1)
    n= 400
    p = 800

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    y = 750
    weights = random_state.rand(y)

    matrix_gen = (np.dot(X.T[:, k + 1:],
                         X[:p - (k + 1), :]) * weights[k] for k in range(y))

    X_sum = sum(matrix_gen)

    return X_sum

简化版本 (使用新的@运算符-语法糖-代替np.matmul

from functools import reduce

def test_reduce_sum():
    random_state = np.random.RandomState(1)
    n= 400
    p = 800

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    y = 750
    weights = random_state.rand(y)

    matrix_list = [(X.T[:, k + 1:] @
                    X[:p - (k + 1), :]) * weights[k] for k in range(y)]

    X_sum = reduce(lambda x,y: x + y, matrix_list)

    return X_sum

连接版本

def test_concat_sum():
    random_state = np.random.RandomState(1)
    n= 400
    p = 800

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    y = 750
    weights = random_state.rand(y)

    x_mat = np.concatenate([np.matmul(X.T[:, k + 1:],
                                      X[:p - (k + 1), :]) for k in range(y)])

    wgt_mat = np.concatenate([np.full((n,1), weights[k]) for k in range(y)])

    mul_res = x_mat * wgt_mat        
    X_sum = mul_res.reshape(-1, n, n).sum(axis=0)

    return X_sum

使用np.matmul进行列表理解

def test_matmul_sum():
    random_state = np.random.RandomState(1)
    n = 400
    p = 800
    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    y = 750
    weights = random_state.rand(y)
    # Use list comprehension and np.matmul 
    matrices_list = [np.matmul(X.T[:, k + 1:],
                               X[:p - (k + 1), :]) * weights[k] for k in range(y)]

    # Sum matrices in list of matrices to get the final result   
    X_sum = np.sum(matrices_list, axis=0)

    return X_sum

时间

import time

start_time = time.time()
res_for = test_for_sum()
print("SUM: {} seconds ---".format(time.time() - start_time))

start_time = time.time()
res_list = test_list_sum()
print("LIST: {} seconds ---".format(time.time() - start_time))

start_time = time.time()
res_gen = test_gen_sum()
print("GEN: {} seconds ---".format(time.time() - start_time))

start_time = time.time()
res_reduce= test_reduce_sum()
print("REDUCE: {} seconds ---".format(time.time() - start_time))

start_time = time.time()
res_concat = test_concat_sum()
print("CONCAT: {} seconds ---".format(time.time() - start_time))

start_time = time.time()
res_matmul = test_matmul_sum()
print("MATMUL: {} seconds ---".format(time.time() - start_time))

平等测试

print(np.array_equal(res_for, res_list))
# True
print(np.array_equal(res_for, res_gen))
# True
print(np.array_equal(res_for, res_reduce))
# True
print(np.array_equal(res_for, res_concat))
# True
print(np.array_equal(res_for, res_matmul))
# True

首次运行

# SUM: 21.569773197174072 seconds ---
# LIST: 23.576102018356323 seconds ---
# GEN: 21.385253429412842 seconds ---
# REDUCE: 21.426464080810547 seconds ---
# CONCAT: 21.059731483459473 seconds ---
# MATMUL: 23.57494807243347 seconds ---

第二次运行

# SUM: 21.6339168548584 seconds ---
# LIST: 19.767740488052368 seconds ---
# GEN: 23.86947798728943 seconds ---
# REDUCE: 19.880712032318115 seconds ---
# CONCAT: 20.761067152023315 seconds ---
# MATMUL: 23.55513620376587 seconds ---

第三次运行

# SUM: 22.764745473861694 seconds ---
# LIST: 19.953850984573364 seconds ---
# GEN: 24.37714171409607 seconds ---
# REDUCE: 22.54508638381958 seconds ---
# CONCAT: 21.20585823059082 seconds ---
# MATMUL: 22.303589820861816 seconds ---