我需要计算许多3x3旋转矩阵的组合。
以下是在functools.reduce
和matmul
上对numpy
应用cupy
的比较:
import timeit
from functools import reduce
import numpy as np
import cupy as cp
from pyrr.matrix33 import create_from_axis_rotation
# generate random rotation matrices
axes = np.random.rand(10000, 3)
angles = np.pi * np.random.rand(10000)
rotations = [create_from_axis_rotation(*params) for params in zip(axes, angles)]
# then reduce with matmul
xp = np # numpy
xp_rotations = [xp.asarray(rotation) for rotation in rotations]
timexp = timeit.timeit("reduce(xp.matmul, xp_rotations)", number=10, globals=globals())
print(f"{xp.__name__}: {timexp * 1000:0.3}ms")
xp = cp # cupy
xp_rotations = [xp.asarray(rotation) for rotation in rotations]
timexp = timeit.timeit("reduce(xp.matmul, xp_rotations)", number=10, globals=globals())
print(f"{xp.__name__}: {timexp * 1000:0.3}ms")
在配备Titan GPU的良好机器上,这可以提供:
numpy: 1.63e+02ms
cupy: 8.78e+02ms
由于某种原因,GPU的运行速度慢得多。
无论如何,有没有一种方法可以更快地计算出这个值?
我找到了一个相当简单的解决方案,该解决方案适用于所有小的线性变换的链(并且可以轻松地扩展为仿射变换)。
def reduce_loop(matrices):
""" non-optimized reduce """
mat = matrices[0]
for _mat in matrices[1:]:
mat = mat @ _mat
return mat
def reduce_split(matrices):
""" reduce by multiplying pairs of matrices recursively """
if len(matrices) == 1:
return matrices[0]
neven = (len(matrices) // 2) * 2
reduced = matrices[:neven:2] @ matrices[1:neven:2]
if len(matrices) > neven: # len(matrices) is odd
reduced[-1] = reduced[-1] @ matrices[-1]
return reduce_split(reduced)
time = timeit.timeit("reduce_loop(rotations)", number=10, globals=globals())
print(f"reduce_loop: {time * 1000:0.3}ms")
time = timeit.timeit("reduce_split(rotations)", number=10, globals=globals())
print(f"reduce_split: {time * 1000:0.3}ms")
给予:
reduce_loop: 2.14e+02ms
reduce_split: 24.5ms
我确定它不是最佳选择,但它使用了numpy
(可能还有cupy
)的优化。
答案 0 :(得分:1)
functools.reduce()已从核心python中删除,因为它效率低下且不是pythonic。没有cuPy等效项,只有functools库中的主机版本
您的cuPy代码花费了大部分时间,无效率地将数据从主机复制到设备,然后再返回……数千次-因为reduce()仅在主机上运行,而不在GPU上运行。您正在使用PCI总线,而不是GPU
考虑将列表“旋转”成cuPy矩阵,然后使用跨步(而不是python列表)
使用cuPy归约内核执行matmul https://docs.cupy.dev/en/stable/reference/generated/cupy.ReductionKernel.html