PySpark:加载libsvm并训练LR模型

时间:2018-06-25 06:07:02

标签: apache-spark pyspark logistic-regression libsvm

我想将libsvm数据加载到PySpark中,并用它训练逻辑回归。

# load libsvm data    
data= spark.read.format("libsvm").load(data_path)
data.show()

+-----+--------------------+
|label|            features|
+-----+--------------------+
|  0.0|(148,[1,2,28,29,3...|
|  0.0|(148,[1,2,28,29,3...|
|  0.0|(148,[0,1,2,27,28...|
|  0.0|(148,[1,2,28,29,3...|
|  1.0|(148,[0,1,2,28,29...|
|  0.0|(148,[0,1,2,3,5,6...|
|  1.0|(148,[0,1,2,28,29...|
|  0.0|(148,[1,2,28,29,3...|
|  1.0|(148,[1,2,28,29,3...|
|  0.0|(148,[1,2,28,29,8...|
+-----+--------------------+

我试图训练一个逻辑模型,并使用该模型预测我的测试数据。

# train Logistic Regression Model
lr = LogisticRegression(maxIter=100, 
                        regParam=0.5, 
                        elasticNetParam=0.5)

# Fits the model
lrModel = lr.fit(train)

# transform data (prediction)
predictions = lrModel.transform(validation)

predictions.show()

+-----+--------------------+--------------------+--------------------+----------+
|label|            features|       rawPrediction|         probability|prediction|
+-----+--------------------+--------------------+--------------------+----------+
|  0.0|(148,[0,1,2,3,4,5...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  0.0|(148,[0,1,2,3,4,5...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  0.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  1.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  1.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  0.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  0.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  0.0|(148,[0,1,2,3,6,7...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  1.0|(148,[0,1,2,3,6,7...|[0.62606419766465...|[0.65159649618536...|       0.0|
|  0.0|(148,[0,1,2,3,6,7...|[0.62606419766465...|[0.65159649618536...|       0.0|

该模型似乎无法正常工作,并且对所有数据返回相同的预测。

请问为什么以及如何解决?非常感谢!

0 个答案:

没有答案