如何创建Spark Dataframe以运行Linear Regression Spark 2 ML

时间:2016-10-07 06:31:26

标签: apache-spark dataframe pyspark

我正在研究Spark 2和crated dataframe

sdf=sqlContext.createDataFrame(a)
sdf.show(5)
+-------+-----+-----+--------+--------+-------+
|success| apt1| apr2|extraver|itdegree|otherit|
+-------+-----+-----+--------+--------+-------+
|   68.0|117.0|104.0|    27.0|     0.0|    0.0|
|   36.0| 93.0| 90.0|    43.0|     0.0|    0.0|
|   25.0|101.0| 96.0|    48.0|     1.0|    0.0|
|   36.0|116.0|108.0|    59.0|     0.0|    0.0|
|   35.0|103.0| 92.0|    45.0|     1.0|    0.0|
+-------+-----+-----+--------+--------+-------+

我正在尝试使用MI运行线性回归

我该如何为MI创建数据?

from pyspark.ml.linalg import Vectors
>>> sdf1 = spark.createDataFrame([
...     (1.0, 2.0, Vectors.dense(1.0)),
...     (0.0, 2.0, Vectors.sparse(1, [], []))], ["label", "weight", "features"])
>>> lr = LinearRegression(maxIter=5, regParam=0.0, solver="normal", weightCol="weight")
>>> model = lr.fit(sdf1)[enter image description here][1]

成功是因变量
其他是自变量。

0 个答案:

没有答案