我在pyspark中使用Spark Pipelines运行线性回归。一旦训练了线性回归模型,我该如何得出系数?
这是我的管道代码:
# Get all of our features together into one array called "features". Do not include the label!
feature_assembler = VectorAssembler(inputCols=get_column_names(df_train), outputCol="features")
# Define our model
lr = LinearRegression(maxIter=100, elasticNetParam=0.80, labelCol="label", featuresCol="features",
predictionCol = "prediction")
# Define our pipeline
pipeline_baseline = Pipeline(stages=[feature_assembler, lr])
# Train our model using the training data
model_baseline = pipeline_baseline.fit(df_train)
# Use our trained model to make predictions using the validation data
output_baseline = model_baseline.transform(df_val) #.select("features", "label", "prediction", "coefficients")
predictions_baseline = output_baseline.select("label", "prediction")
我尝试过使用PipelineModel class中的方法。以下是我尝试获取系数的方法,但我只得到一个空列表和一个空字典:
params = model_baseline.stages[1].params
print 'Try 1 - Parameters: %s' %(params)
params = model_baseline.stages[1].extractParamMap()
print 'Try 2 - Parameters: %s' %(params)
Out[]:
Try 1 - Parameters: []
Try 2 - Parameters: {}
PipelineModel是否有返回训练系数的方法?
答案 0 :(得分:4)
你看错了财产。 params
可用于提取Estimator
或Transformer
Params
类似输入或输出列(请参阅ML Pipeline parameters docs而不是估算值。
LinearRegressionModel
使用coefficients
:
model.stages[-1].coefficients