我正在研究Spark 2和crated dataframe
sdf=sqlContext.createDataFrame(a)
sdf.show(5)
+-------+-----+-----+--------+--------+-------+ |success| apt1| apr2|extraver|itdegree|otherit| +-------+-----+-----+--------+--------+-------+ | 68.0|117.0|104.0| 27.0| 0.0| 0.0| | 36.0| 93.0| 90.0| 43.0| 0.0| 0.0| | 25.0|101.0| 96.0| 48.0| 1.0| 0.0| | 36.0|116.0|108.0| 59.0| 0.0| 0.0| | 35.0|103.0| 92.0| 45.0| 1.0| 0.0| +-------+-----+-----+--------+--------+-------+
我正在尝试使用MI运行线性回归
我该如何为MI创建数据?
from pyspark.ml.linalg import Vectors
>>> sdf1 = spark.createDataFrame([
... (1.0, 2.0, Vectors.dense(1.0)),
... (0.0, 2.0, Vectors.sparse(1, [], []))], ["label", "weight", "features"])
>>> lr = LinearRegression(maxIter=5, regParam=0.0, solver="normal", weightCol="weight")
>>> model = lr.fit(sdf1)[enter image description here][1]
成功是因变量
其他是自变量。