我正在对证券市场数据进行ETL以生成自组织地图。我想转置行数据:
AAME,20030101,1.63,1.63,1.63,1.63,0 AAON,20030101,5.4635,5.4635,5.4635,5.4635,0 AAPL,20030101,7.165,7.165,7.165,7.165,0 ABAX,20030101,3.96,3.96,3.96,3.96,0 ... ZUMZ,20131104,29.55,29.79,29.18,29.46,218100
到列数据中:
AAME 1.63,1.65,...... AAON 5.4635,5.3
如果我尝试使用BlockMatrix,我应该尝试使用ReduceByKey(extend)还是ReduceByKey(append)?
https://spark.apache.org/docs/latest/mllib-data-types.html#blockmatrix
E.g。 -
val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
// Nothing happens if it is valid.
matA.validate()
// Calculate A^T A.
val ata = matA.transpose.multiply(matA)