我想了解pyspark(特别是套索)的Logistic回归中的系数矩阵是什么?这是一个逻辑回归,所以我认为权重应该简单地以1xn的形式表示n个特征。
此外,我该如何映射特征及其系数,以查看哪个特征的系数为0。
注意:我使用MulticlassClassificationEvaluator是因为BinaryClassificationEvaluator中没有加权召回的可能性。我相信这不是问题吗?
我正在得到一个稀疏矩阵。我正在执行具有23个特征的二进制分类,但得到的是3X23稀疏矩阵。应该不是1X23。
from pyspark.ml.evaluation import BinaryClassificationEvaluator,MulticlassClassificationEvaluator
from pyspark.ml.classification import LogisticRegression
evaluator=MulticlassClassificationEvaluator(metricName="weightedRecall",predictionCol='prediction',labelCol='label')
lr = LogisticRegression(labelCol='label',
featuresCol="features",weightCol="classWeights")
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
paramGrid = ParamGridBuilder()\
.addGrid(lr.elasticNetParam,[1.0])\
.addGrid(lr.maxIter,[10])\
.addGrid(lr.regParam,[0.01, 0.5, 2.0]) \
.build()
cv = CrossValidator(estimator=lr, estimatorParamMaps=paramGrid,
evaluator=evaluator, numFolds=2)
%time cvModel = cv.fit(train_df)
predict_test_hyp=cvModel.transform(test_df)
coef=best.coefficientMatrix
当我转换为密集矩阵时的输出
DenseMatrix([
[ 5.66693393e-01, 0.00000000e+00, -8.52316465e-09,
6.64542431e-03, 0.00000000e+00, 5.34390416e-02,
-4.51579298e-02, 0.00000000e+00, 0.00000000e+00,
-4.51579298e-02, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 3.16000659e-02, 9.72526723e-01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, -4.70342863e-03,
0.00000000e+00, -1.75045505e-01],
[-2.41420676e-01, 1.67133402e-07, 8.30360611e-08,
-6.51168658e-03, 1.04331113e+00, 2.14565081e-01,
4.11803774e-01, 0.00000000e+00, 9.97492835e-08,
4.11803774e-01, 0.00000000e+00, 0.00000000e+00,
2.56259268e-01, 2.52040849e-02, -8.22050592e-01,
6.76655408e-01, 0.00000000e+00, 1.37646488e-01,
0.00000000e+00, 4.33575071e-02, 6.79627660e-03,
3.14889764e-01, 4.54933918e-01],
[-1.09990806e-01, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, -6.04906840e-01, -4.63173578e-01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, -8.73277227e-02, 0.00000000e+00,
-1.98500866e-01, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, -3.00089689e-01, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00]])