Spark-简单线性回归

时间:2018-09-09 03:30:47

标签: apache-spark

我是Spark的新手,并尝试了一些简单的线性回归。似乎无法弄清楚如何解决此错误。 可以请一些专家帮忙吗? 数据集为http://archive.ics.uci.edu/ml/machine-learning-databases/00294/

from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import VectorAssembler
pp_df = spark.read.csv('D:/OneDrive/MachineLearning/Spark
      /CCPP/CCPP/Folds5x2_pp.csv',header= True,inferSchema=True)
vectorAssembler = VectorAssembler (inputCols=["AT","V","AP","RH"], 
       outputCol = "features")
vpp_df = vectorAssembler.transform(pp_df)
vd = vpp_df.select("features","PE")
vdN = vd.selectExpr("features as features", "PE as label")
lr = LinearRegression(featuresCol="features", labelCol = "label")
lr_model = lr.fit(vdN)

  pp_df.take(1)
  [Row(AT=14.96, V=41.76, AP=1024.07, RH=73.17, PE=463.26)]

  pp_df.dtypes

  [('AT', 'double'),
   ('V', 'double'),
   ('AP', 'double'),
   ('RH', 'double'),
   ('PE', 'double')]

  vpp_df.take(1)

  [Row(AT=14.96, V=41.76, AP=1024.07, RH=73.17, PE=463.26, 
   features=DenseVector([14.96, 41.76, 1024.07, 73.17]))]

  vdN.take(1) 
  [Row(features=DenseVector([14.96, 41.76, 1024.07, 73.17]), label=463.26)]

这是我看到的错误


Py4JJavaError Traceback(最近一次通话)  在()中 ----> 1个lr_model = lr.fit(vdN)

  D:\Spark\spark-2.3.1-bin-hadoop2.7\python\pyspark\ml\base.py in 
         fit(self, dataset, params)
        130                 return self.copy(params)._fit(dataset)
        131             else:
    --> 132                 return self._fit(dataset)
        133         else:
        134             raise ValueError("Params must be either a param 
                         map or a list/tuple of param maps, "

0 个答案:

没有答案