我正在尝试使用spark.ml库和管道功能。使用SQL进行拆分似乎存在限制(例如,用于训练和测试):
关于生成的模型:
答案 0 :(得分:2)
好的,问题的第2部分,
How do I access the model weights? The lr optimizer and lr model internally has weights but it is unclear how to use them
在浏览了库的源代码(具有不存在的Scala知识)后,
LogisticRegressionModel(spark.ml)具有属性权重(类型为vector)。
案例1
如果你有LogisticRegressionModel(spark.ml)
LogisticRegression lr = new LogisticRegression();
LogisticRegressionModel lr1 = lr.fit(df_train);
System.out.println("The weights are " + lr1.weights())
案例2
如果你有Pipeline Model,首先使用getModel获取LogisticRegressionModel(Transformer)
LogisticRegression lr = new LogisticRegression().setMaxIter(10).setRegParam(0.01);
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[] { lr });
PipelineModel model = pipeline.fit(train_df);
LogisticRegressionModel lrModel =model.getModel(lr);
System.out.println("The model is {}", lrm.weights());
如果不正确或有更好的方法,请告诉我。