Apache Spark RDD Transpose

时间:2015-06-22 13:48:30

标签: apache-spark transpose

我正在对证券市场数据进行ETL以生成自组织地图。我想转置行数据:

AAME,20030101,1.63,1.63,1.63,1.63,0 AAON,20030101,5.4635,5.4635,5.4635,5.4635,0 AAPL,20030101,7.165,7.165,7.165,7.165,0 ABAX,20030101,3.96,3.96,3.96,3.96,0 ... ZUMZ,20131104,29.55,29.79,29.18,29.46,218100

到列数据中:

AAME 1.63,1.65,...... AAON 5.4635,5.3

如果我尝试使用BlockMatrix,我应该尝试使用ReduceByKey(extend)还是ReduceByKey(append)?

https://spark.apache.org/docs/latest/mllib-data-types.html#blockmatrix

E.g。 -

val matA: BlockMatrix = coordMat.toBlockMatrix().cache()

// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
// Nothing happens if it is valid.
matA.validate()

// Calculate A^T A.
val ata = matA.transpose.multiply(matA)

0 个答案:

没有答案