我有以下数据要进行预测:
demanda.show
+-------------------+-----------------+
| date| demand|
+-------------------+-----------------+
|2011-01-01 00:15:00|68513.11533807972|
|2011-01-01 00:30:00|69180.30436102377|
|2011-01-01 00:45:00|69364.85057321166|
|2011-01-01 01:00:00|68350.48066028186|
|2011-01-01 01:15:00|66166.87430261481|
|2011-01-01 01:30:00| 64843.1919395499|
|2011-01-01 01:45:00|66017.96384408326|
|2011-01-01 02:00:00|65345.51379388567|
|2011-01-01 02:15:00|65567.57817136438|
|2011-01-01 02:30:00|65765.80224690547|
|2011-01-01 02:45:00|67245.32532103594|
|2011-01-01 03:00:00|68103.69418448425|
|2011-01-01 03:15:00|65008.59392258343|
|2011-01-01 03:30:00| 66561.9182807065|
|2011-01-01 03:45:00|66631.92787613077|
|2011-01-01 04:00:00|65307.52861877842|
|2011-01-01 04:15:00|64336.19586473244|
|2011-01-01 04:30:00| 64751.6848532353|
|2011-01-01 04:45:00|65458.80136387812|
|2011-01-01 05:00:00|64744.29165508993|
+-------------------+-----------------+
only showing top 20 rows
这是一个测量140256的DF。 问题是我不知道如何将日期传递给回归算法。
我做了以下事情:
val parsedData = demanda.rdd.map(x=> LabeledPoint(x(1).asInstanceOf[Double],Vectors.dense(x(0).asInstanceOf[Double]))).cache()
val numIterations = 10
val stepSize = 0.00000001
val model = LinearRegressionWithSGD.train(parsedData, numIterations, stepSize)
val valuesAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
valuesAndPreds.foreach((result) => println(s"predicted label: ${result._1}, actual label: ${result._2}"))
val MSE = valuesAndPreds.map{ case(v, p) => math.pow((v - p), 2) }.mean()
println("training Mean Squared Error = " + MSE)
它获得的MSE非常高,因此回归并不好。 数据表示如下: enter image description here
那么,我如何进行回归? 我该如何传递日期值?
谢谢!