我在Spark上使用PCA模型进行了降维,但错误如下:

时间:2016-01-13 07:49:10

标签: apache-spark pca

16/01/13 15:34:07 INFO DAGScheduler: Job 3 finished: first at RowMatrix.scala:65, took 0.013421 s
Exception in thread "main" java.lang.IllegalArgumentException: Argument with more than 65535 cols: 262144
    at org.apache.spark.mllib.linalg.distributed.RowMatrix.checkNumColumns(RowMatrix.scala:135)
    at org.apache.spark.mllib.linalg.distributed.RowMatrix.computeCovariance(RowMatrix.scala:330)
    at org.apache.spark.mllib.linalg.distributed.RowMatrix.computePrincipalComponents(RowMatrix.scala:386)
    at org.apache.spark.mllib.feature.PCA.fit(PCA.scala:46)
    at org.apache.spark.mllib.clustering.KMeansPca$delayedInit$body.apply(KMeansPca.scala:41)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.App$$anonfun$main$1.apply(App.scala:71)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
    at scala.App$class.main(App.scala:71)
    at org.apache.spark.mllib.clustering.KMeansPca$.main(KMeansPca.scala:12)
    at org.apache.spark.mllib.clustering.KMeansPca.main(KMeansPca.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
16/01/13 15:34:07 INFO SparkContext: Invoking stop() from shutdown hook
16/

如何解决此问题?

1 个答案:

答案 0 :(得分:1)

错误表示您已经传递了一个包含太多列(超过65535)的矩阵。

检查代码here并注意他们说“请注意,这不能在超过65535列的矩阵上计算。”

因此,您必须确保矩阵中的列数不超过65535。