Logistic回归系数矩阵pyspark

时间:2019-06-29 19:49:45

标签: apache-spark pyspark regression logistic-regression coefficients

我想了解pyspark(特别是套索)的Logistic回归中的系数矩阵是什么?这是一个逻辑回归,所以我认为权重应该简单地以1xn的形式表示n个特征。

此外,我该如何映射特征及其系数,以查看哪个特征的系数为0。

注意:我使用MulticlassClassificationEvaluator是因为BinaryClassificationEvaluator中没有加权召回的可能性。我相信这不是问题吗?

我正在得到一个稀疏矩阵。我正在执行具有23个特征的二进制分类,但得到的是3X23稀疏矩阵。应该不是1X23。

from pyspark.ml.evaluation import BinaryClassificationEvaluator,MulticlassClassificationEvaluator    
from pyspark.ml.classification import LogisticRegression
evaluator=MulticlassClassificationEvaluator(metricName="weightedRecall",predictionCol='prediction',labelCol='label')
lr = LogisticRegression(labelCol='label', 
featuresCol="features",weightCol="classWeights")
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
paramGrid = ParamGridBuilder()\
    .addGrid(lr.elasticNetParam,[1.0])\
    .addGrid(lr.maxIter,[10])\
    .addGrid(lr.regParam,[0.01, 0.5, 2.0]) \
    .build()

cv = CrossValidator(estimator=lr, estimatorParamMaps=paramGrid, 
evaluator=evaluator, numFolds=2)
%time cvModel = cv.fit(train_df)
predict_test_hyp=cvModel.transform(test_df)


coef=best.coefficientMatrix

当我转换为密集矩阵时的输出

DenseMatrix([

        [ 5.66693393e-01,  0.00000000e+00, -8.52316465e-09,
           6.64542431e-03,  0.00000000e+00,  5.34390416e-02,
          -4.51579298e-02,  0.00000000e+00,  0.00000000e+00,
          -4.51579298e-02,  0.00000000e+00,  0.00000000e+00,
           0.00000000e+00,  3.16000659e-02,  9.72526723e-01,
           0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
           0.00000000e+00,  0.00000000e+00, -4.70342863e-03,
           0.00000000e+00, -1.75045505e-01],

         [-2.41420676e-01,  1.67133402e-07,  8.30360611e-08,
          -6.51168658e-03,  1.04331113e+00,  2.14565081e-01,
           4.11803774e-01,  0.00000000e+00,  9.97492835e-08,
           4.11803774e-01,  0.00000000e+00,  0.00000000e+00,
           2.56259268e-01,  2.52040849e-02, -8.22050592e-01,
           6.76655408e-01,  0.00000000e+00,  1.37646488e-01,
           0.00000000e+00,  4.33575071e-02,  6.79627660e-03,
           3.14889764e-01,  4.54933918e-01],

         [-1.09990806e-01,  0.00000000e+00,  0.00000000e+00,
           0.00000000e+00, -6.04906840e-01, -4.63173578e-01,
           0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
           0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
           0.00000000e+00, -8.73277227e-02,  0.00000000e+00,
          -1.98500866e-01,  0.00000000e+00,  0.00000000e+00,
           0.00000000e+00, -3.00089689e-01,  0.00000000e+00,
           0.00000000e+00,  0.00000000e+00]])

0 个答案:

没有答案