Spark ML线性回归的局限性

时间:2017-02-27 20:05:47

标签: apache-spark apache-spark-ml

我在管道中使用Linear Regression estimator

在我的原始设置中,我使用具有3774行和500个特征的数据集训练模型。 Spark没有错误地处理了这个任务。

但是,我遇到了一个新的训练数据集的问题,这些数据有6072行但功能相同。在训练模型时,我收到以下错误:

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed
    at scala.Predef$.require(Predef.scala:212)
    at breeze.optimize.OWLQN$$anonfun$3.apply(OWLQN.scala:95)
    at breeze.optimize.OWLQN$$anonfun$3.apply(OWLQN.scala:93)
    at breeze.linalg.DenseVector$CanZipMapKeyValuesDenseVector.map(DenseVector.scala:563)
    at breeze.linalg.DenseVector$CanZipMapKeyValuesDenseVector.mapActive(DenseVector.scala:571)
    at breeze.linalg.DenseVector$CanZipMapKeyValuesDenseVector.mapActive(DenseVector.scala:554)
    at breeze.optimize.OWLQN.adjust(OWLQN.scala:93)
    at breeze.optimize.FirstOrderMinimizer.initialState(FirstOrderMinimizer.scala:49)
    at breeze.optimize.FirstOrderMinimizer.iterations(FirstOrderMinimizer.scala:89)
    at org.apache.spark.ml.optim.QuasiNewtonSolver.solve(NormalEquationSolver.scala:103)
    at org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:268)
    at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:215)
    at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:76)
    at org.apache.spark.ml.Predictor.fit(Predictor.scala:96)
    at lasso.Lasso.main(Lasso.java:279)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

观察数量唯一变化,而不是特征数量。线性回归可以处理输入大小的已知限制吗?

0 个答案:

没有答案