我在scala中运行逻辑回归模型,我有一个如下所示的数据框:
DF
+-----------+------------+
|x |y |
+-----------+------------+
| 0| 0|
| 0| 33|
| 0| 58|
| 0| 96|
| 0| 1|
| 1| 21|
| 0| 10|
| 0| 65|
| 1| 7|
| 1| 28|
+-----------+------------+
我需要将其转换为类似这样的内容
+-----+------------------+
|label| features |
+-----+------------------+
| 0.0|(1,[1],[0]) |
| 0.0|(1,[1],[33]) |
| 0.0|(1,[1],[58]) |
| 0.0|(1,[1],[96]) |
| 0.0|(1,[1],[1]) |
| 1.0|(1,[1],[21]) |
| 0.0|(1,[1],[10]) |
| 0.0|(1,[1],[65]) |
| 1.0|(1,[1],[7]) |
| 1.0|(1,[1],[28]) |
+-----------+------------+
我试过
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
val assembler = new VectorAssembler()
.setInputCols(Array("x"))
.setOutputCol("Feature")
var lrModel= lr.fit(daf.withColumnRenamed("x","label").withColumnRenamed("y","features"))
感谢任何帮助。
答案 0 :(得分:2)
鉴于dataframe
为
+---+---+
|x |y |
+---+---+
|0 |0 |
|0 |33 |
|0 |58 |
|0 |96 |
|0 |1 |
|1 |21 |
|0 |10 |
|0 |65 |
|1 |7 |
|1 |28 |
+---+---+
如下所示
val assembler = new VectorAssembler()
.setInputCols(Array("x", "y"))
.setOutputCol("features")
val output = assembler.transform(df).select($"x".cast(DoubleType).as("label"), $"features")
output.show(false)
会给你结果
+-----+----------+
|label|features |
+-----+----------+
|0.0 |(2,[],[]) |
|0.0 |[0.0,33.0]|
|0.0 |[0.0,58.0]|
|0.0 |[0.0,96.0]|
|0.0 |[0.0,1.0] |
|1.0 |[1.0,21.0]|
|0.0 |[0.0,10.0]|
|0.0 |[0.0,65.0]|
|1.0 |[1.0,7.0] |
|1.0 |[1.0,28.0]|
+-----+----------+
现在使用LogisticRegression
很容易
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
val lrModel = lr.fit(output)
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
您的输出为
Coefficients: [1.5672602877378823,0.0] Intercept: -1.4055020984891717