在PySpark ML中创建customTransformer时出错

时间:2019-06-28 21:24:17

标签: apache-spark apache-spark-mllib

我已经在PySpark ML中创建了一个自定义转换器,当我尝试将其用作用于训练模型的管道的一部分时,出现以下错误:

    trained_pipeline = crossval.fit(train_split).bestModel
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/base.py", 
    line 132, in fit
    return self._fit(dataset)
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/tuning.py", 
    line 303, in _fit
    tasks = _parallelFitTasks(est, train, eva, validation, epm, 
    collectSubModelsParam)
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/tuning.py", 
    line 49, in _parallelFitTasks
    modelIter = est.fitMultiple(train, epm)
    File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 103, in fitMultiple
    estimator = self.copy()
    File 
     "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 128, in copy
     stages = [stage.copy(extra) for stage in that.getStages()]
     File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 128, in <listcomp>
     stages = [stage.copy(extra) for stage in that.getStages()]
     File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 383, in copy
    that._paramMap = {}
    AttributeError: 'dict' object has no attribute '_paramMap'

自定义的Transformer代码如下:

class MyCustomTransformer(Transformer):

   def _transform(self, input_df):
      new_df = input_df.withColumn("newcol",
                F.when(
                    F.when(F.col(col1) == F.lit("A"), F.col(W))
                    .otherwise(
                        F.when(F.col(col1) == F.lit("B"), F.col("X")))
                    .otherwise(...)
      return new_df

有人可以告诉我我做错了什么或我想念什么吗?

0 个答案:

没有答案