Question

我有大约650个基于csv的矩阵。我计划使用Numpy加载每个，如下例所示：

m1 = numpy.loadtext(open("matrix1.txt", "rb"), delimiter=",", skiprows=1)

我需要处理matrix2.txt，matrix3.txt，...，matrix650.txt文件。

我的最终目标是将每个矩阵相互乘以，这意味着我不一定需要维护650个矩阵，而只需要维持2个（1个正在进行，1个正在我正在将我的正在增加。）

这是我用从1到n定义的矩阵的含义示例：M1，M2，M3，..，Mn。

M1 * M2 * M3 * ... *的Mn

所有矩阵的尺寸都相同。 矩阵不是正方形。有197行和11列。没有一个矩阵是稀疏的，每个单元都有用。

在python中执行此操作的最佳/最有效方法是什么？

编辑：我采取了建议并通过转置使其工作，因为它不是方阵。作为问题的附录，我在Numpy中有一种方法可以逐元素地进行吗？

Answer 1

Python3解决方案，如果“每个矩阵彼此相对”实际上意味着将它们相加成行并且矩阵具有兼容的维度（（n，m）·（m，o）·（ o，p）·...），你用“（1个正在进行，1个......）”暗示，然后使用（如果可用）：

from functools import partial
fnames = map("matrix{}.txt".format, range(1, 651))
np.linalg.multi_dot(map(partial(np.loadtxt, delimiter=',', skiprows=1), fnames))

或：

from functools import reduce, partial
fnames = map("matrix{}.txt".format, range(1, 651))
matrices = map(partial(np.loadtxt, delimiter=',', skiprows=1), fnames)
res = reduce(np.dot, matrices)

地图等在python3中是懒惰的，因此可以根据需要读取文件。 Loadtxt不需要预先打开的文件，文件名也可以。

懒惰地完成所有组合，假设矩阵具有相同的形状（将重新读取数据）：

from functools import partial
from itertools import starmap, combinations
map_loadtxt = partial(map, partial(np.loadtxt, delimiter=',', skiprows=1))
fname_combs = combinations(map("matrix{}.txt".format, range(1, 651)), 2)
res = list(starmap(np.dot, map(map_loadtxt, fname_combs)))

使用一些分组来减少文件的重新加载：

from itertools import groupby, combinations, chain
from functools import partial
from operator import itemgetter

loader = partial(np.loadtxt, delimiter=',', skiprows=1)
fname_pairs = combinations(map("matrix{}.txt".format, range(1, 651)), 2)
groups = groupby(fname_pairs, itemgetter(0))
res = list(chain.from_iterable(
    map(loader(k).dot, map(loader, map(itemgetter(1), g)))
    for k, g in groups
))

由于矩阵不是方形，但具有相同的尺寸，因此您必须在乘法之前添加转置以匹配尺寸。例如，loader(k).T.dot或map(np.transpose, map(loader, ...))。

另一方面，如果问题实际上是为了解决元素明智的乘法问题，请将np.dot替换为np.multiply。

Answer 2

<强> 1。变体：好的代码，但一次读取所有矩阵

matrixFileCount = 3
matrices = [np.loadtxt(open("matrix%s.txt" % i ), delimiter=",", skiprows=1) for i in range(1,matrixFileCount+1)]
allC = itertools.combinations([x for x in range(matrixFileCount)], 2)
allCMultiply = [np.dot(matrices[c[0]], matrices[c[1]]) for c in allC]
print  allCMultiply

<强> 2。变体：一次只加载2个文件，漂亮的代码但很多重新加载

allCMulitply = []
fileList = ["matrix%s.txt" % x for x in range(1,matrixFileCount+1)]
allC = itertools.combinations(fileList, 2)
for c in allC:
    m = [np.loadtxt(open(file), delimiter=",", skiprows=1) for file in c]
    allCMulitply.append(np.dot(m[0], m[1]))
print allCMulitply

第3。变体：像第二个，但每次都避免加载。但内存中的一个点只有2个矩阵

因为使用itertools创建的排列类似于(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)，您可以避免同时加载两个矩阵。

matrixFileCount = 3
allCMulitply = []
mLoaded = {'file' : None, 'matrix' : None}
fileList = ["matrix%s.txt" % x for x in range(1,matrixFileCount+1)]
allC = itertools.combinations(fileList, 2)
for c in allC:
    if c[0] is mLoaded['file']:
        m = [mLoaded['matrix'], np.loadtxt(open(c[1]), delimiter=",", skiprows=1)]
    else:
        mLoaded = {'file' : None, 'matrix' : None}
        m = [np.loadtxt(open(file), delimiter=",", skiprows=1) for file in c]
    mLoaded = {'file' : c[0], 'matrix' : m[0]}
    allCMulitply.append(np.dot(m[0], m[1]))
print allCMulitply

<强>性能

如果你可以在内存中一次加载所有Matrix，第一部分比第二部分快，因为在第二部分你重新加载矩阵很多。第三部分比第一部分慢，但比第二部分快，导致它有时避免重新加载矩阵。

0.943613052368 (Part 1: 10 Matrices a 2,2 with 1000 executions)
7.75622487068  (Part 2: 10 Matrices a 2,2 with 1000 executions)
4.83783197403  (Part 3: 10 Matrices a 2,2 with 1000 executions)

Answer 3

Kordi的回答在进行乘法之前加载所有矩阵。如果你知道矩阵会变小，那就没关系了。但是，如果您想节省内存，请执行以下操作：

import numpy as np

def get_dot_product(fnames):
    assert len(fnames) > 0
    accum_val = np.loadtxt(fnames[0], delimiter=',', skiprows=1)
    return reduce(_product_from_file, fnames[1:], initializer=accum_val)

def _product_from_file(running_product, fname):
    return running_product.dot(np.loadtxt(fname, delimiter=',', skiprows=1))

如果矩阵很大且形状不规则（不是正方形），还有用于确定最佳关联分组的优化算法（即括号的放置位置），但在大多数情况下我怀疑它是否值得开销加载和卸载每个文件两次，一次找出关联分组然后再执行一次。即使在相当大的矩阵上，NumPy的速度也非常快。

Answer 4

避免map，reduce之类的真正简单的解决方案怎么样？默认情况下，默认的numpy数组对象执行逐元素乘法。

size = (197, 11)

result = numpy.ones(size)
for i in range(1, 651):
    result *= numpy.loadtext(open("matrix{}.txt".format(i), "rb"),
                             delimiter=",", skiprows=1)

具有Numpy的多矩阵乘法

4 个答案: