我在管道中使用Linear Regression estimator。
在我的原始设置中,我使用具有3774行和500个特征的数据集训练模型。 Spark没有错误地处理了这个任务。
但是,我遇到了一个新的训练数据集的问题,这些数据有6072行但功能相同。在训练模型时,我收到以下错误:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:212)
at breeze.optimize.OWLQN$$anonfun$3.apply(OWLQN.scala:95)
at breeze.optimize.OWLQN$$anonfun$3.apply(OWLQN.scala:93)
at breeze.linalg.DenseVector$CanZipMapKeyValuesDenseVector.map(DenseVector.scala:563)
at breeze.linalg.DenseVector$CanZipMapKeyValuesDenseVector.mapActive(DenseVector.scala:571)
at breeze.linalg.DenseVector$CanZipMapKeyValuesDenseVector.mapActive(DenseVector.scala:554)
at breeze.optimize.OWLQN.adjust(OWLQN.scala:93)
at breeze.optimize.FirstOrderMinimizer.initialState(FirstOrderMinimizer.scala:49)
at breeze.optimize.FirstOrderMinimizer.iterations(FirstOrderMinimizer.scala:89)
at org.apache.spark.ml.optim.QuasiNewtonSolver.solve(NormalEquationSolver.scala:103)
at org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:268)
at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:215)
at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:76)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:96)
at lasso.Lasso.main(Lasso.java:279)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
观察数量唯一变化,而不是特征数量。线性回归可以处理输入大小的已知限制吗?