使用mllib

时间:2017-10-17 06:28:29

标签: scala apache-spark apache-spark-mllib matrix-multiplication

我使用Spark 1.6和YARN,我有一个工作,使用Spark mllib进行一些计算,其中一个是矩阵乘法,我使用CoordinateMatrix来做它。代码是这样的:

def coordinateMatrixMultiply(leftMatrix: CoordinateMatrix, rightMatrix: CoordinateMatrix): CoordinateMatrix = {
  val M_ = leftMatrix.entries.map({ case MatrixEntry(i, j, v) => (j, (i, v)) })
  val N_ = rightMatrix.entries.map({ case MatrixEntry(j, k, w) => (j, (k, w)) })
  val productEntries = M_.join(N_)
    .map({ case (_, ((i, v), (k, w))) => ((i, k), (v * w)) })
    .reduceByKey(_ + _)
    .map({ case ((i, k), sum) => MatrixEntry(i, k, sum) })
  new CoordinateMatrix(productEntries)
}

但是我收到了一个错误,其中说:

java.lang.IllegalArgumentException: requirement failed: Both matrices must have the same number of rows. A.numRows: 159, B.numRows: 158
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.mllib.linalg.distributed.BlockMatrix.blockMap(BlockMatrix.scala:359)
    at org.apache.spark.mllib.linalg.distributed.BlockMatrix.add(BlockMatrix.scala:397)
    at com.sankuai.nlpml.kg.syn_sim.SynSim$.process(SynSim.scala:312)
    at com.sankuai.nlpml.kg.syn_sim.SynSim$.main(SynSim.scala:365)
    at com.sankuai.nlpml.kg.syn_sim.SynSim.main(SynSim.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)

我已多次提交作业,但并非所有人都有这个例外。我调试了代码,发现coordinateMatrixMultiply方法的返回值有所不同,但代码保持不变。我不知道为什么,我也不知道如何解决它。任何人都可以帮助我吗?

1 个答案:

答案 0 :(得分:0)

不要自己实现乘法,而应考虑转换为BlockMatrix并使用提供的multiply方法。乘法后,转换回CoordinateMatrix

def coordinateMatrixMultiply(leftMatrix: CoordinateMatrix, rightMatrix: CoordinateMatrix) =
  leftMatrix.toBlockMatrix().multiply(rightMatrix.toBlockMatrix()).toCoordinateMatrix()