我想将libsvm数据加载到PySpark中,并用它训练逻辑回归。
# load libsvm data
data= spark.read.format("libsvm").load(data_path)
data.show()
+-----+--------------------+
|label| features|
+-----+--------------------+
| 0.0|(148,[1,2,28,29,3...|
| 0.0|(148,[1,2,28,29,3...|
| 0.0|(148,[0,1,2,27,28...|
| 0.0|(148,[1,2,28,29,3...|
| 1.0|(148,[0,1,2,28,29...|
| 0.0|(148,[0,1,2,3,5,6...|
| 1.0|(148,[0,1,2,28,29...|
| 0.0|(148,[1,2,28,29,3...|
| 1.0|(148,[1,2,28,29,3...|
| 0.0|(148,[1,2,28,29,8...|
+-----+--------------------+
我试图训练一个逻辑模型,并使用该模型预测我的测试数据。
# train Logistic Regression Model
lr = LogisticRegression(maxIter=100,
regParam=0.5,
elasticNetParam=0.5)
# Fits the model
lrModel = lr.fit(train)
# transform data (prediction)
predictions = lrModel.transform(validation)
predictions.show()
+-----+--------------------+--------------------+--------------------+----------+
|label| features| rawPrediction| probability|prediction|
+-----+--------------------+--------------------+--------------------+----------+
| 0.0|(148,[0,1,2,3,4,5...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 0.0|(148,[0,1,2,3,4,5...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 0.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 1.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 1.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 0.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 0.0|(148,[0,1,2,3,5,6...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 0.0|(148,[0,1,2,3,6,7...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 1.0|(148,[0,1,2,3,6,7...|[0.62606419766465...|[0.65159649618536...| 0.0|
| 0.0|(148,[0,1,2,3,6,7...|[0.62606419766465...|[0.65159649618536...| 0.0|
该模型似乎无法正常工作,并且对所有数据返回相同的预测。
请问为什么以及如何解决?非常感谢!