Question

我正在尝试在存储为spark RowMatrix的矩阵上运行一些基本的线性代数运算（特别是转置，点积和逆运算），如此处所述here（使用Python API）。按照文档中的示例（对于我的情况，我将在矩阵中有更多行，因此需要Spark），假设我有这样的事情：

dot(mat.T,mat)

给定这样的分布式矩阵，是否存在用于进行矩阵转置和点积的例程，例如：

inverse(mat)

或矩阵逆？

os.path

我似乎无法在文档中找到有关此内容的任何内容。寻找（a）指向相关文档的指针或（b）自己实现此方法的方法。

Answer 1

就目前而言（Spark 1.6.0）pyspark.mllib.linalg.distributed API仅限于计算行/列和类型之间的转换等基本操作。

Scala API支持更广泛的方法，包括乘法（RowMatrix.multiply，Indexed.RowMatrix.multiply），换位，SVD（IndexedRowMatrix.computeSVD），QR分解（RowMatrix.tallSkinnyQR），Grammian矩阵计算（computeGramianMatrix），PCA（RowMatrix.computePrincipalComponents），可用于实现更复杂的线性代数函数。

Answer 2

在Spark 1.6及更高版本中，您可以通过BlockMatrix类进行矩阵算术运算。 Spark 1.6中只提供乘法和加法。在Spark 2.0中，添加了更多内容。在撰写本文时，您必须手动实现逆转，但可以使用点和转置。 https://github.com/apache/spark/blob/branch-2.0/python/pyspark/mllib/linalg/distributed.py#L811。这是Spark 1.6的例子。

from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix, BlockMatrix

sc = SparkContext()
rows = sc.parallelize([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) \
    .zipWithIndex()

# need a SQLContext() to generate an IndexedRowMatrix from RDD
sqlContext = SQLContext(sc)
rows = IndexedRowMatrix( \
    rows \
    .map(lambda row: IndexedRow(row[1], row[0])) \
    ).toBlockMatrix()

mat_product = rows.multiply(<SOME OTHER BLOCK MATRIX>)

火花矩阵的基本线性代数

2 个答案: