Question

我正在尝试使用Zeppelin在Spark ML中构建模型。我是这个领域的新手，想要一些帮助。我想我需要将正确的数据类型设置为列并将第一列设置为标签。非常感谢任何帮助，谢谢

val training = sc.textFile("hdfs:///ford/fordTrain.csv")
val header = training.first
val inferSchema = true  
val df = training.toDF

val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)

 val lrModel = lr.fit(df)

// Print the coefficients and intercept for multinomial logistic regression
println(s"Coefficients: \n${lrModel.coefficientMatrix}")
println(s"Intercepts: ${lrModel.interceptVector}")

我正在使用的csv文件的片段是：

IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2
0,34.7406,9.84593,1400,42.8571,0.290601,572,104.895,0,0,0,

Answer 1

正如您所提到的，您错过了features列。它是包含所有预测变量的向量。您必须使用VectorAssembler创建它。

IsAlert是标签，所有其他变量（p1，p2，...）都是预测变量，您可以创建features列（实际上您可以将其命名为任何您想要的而不是{{ 1}}）by：

features

参考：https://spark.apache.org/docs/latest/ml-features.html#vectorassembler。

Field＆＃34;功能＆＃34;不存在。 SparkML

1 个答案: