我已经在PySpark ML中创建了一个自定义转换器,当我尝试将其用作用于训练模型的管道的一部分时,出现以下错误:
trained_pipeline = crossval.fit(train_split).bestModel
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/base.py",
line 132, in fit
return self._fit(dataset)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/tuning.py",
line 303, in _fit
tasks = _parallelFitTasks(est, train, eva, validation, epm,
collectSubModelsParam)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/tuning.py",
line 49, in _parallelFitTasks
modelIter = est.fitMultiple(train, epm)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 103, in fitMultiple
estimator = self.copy()
File
"/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 128, in copy
stages = [stage.copy(extra) for stage in that.getStages()]
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 128, in <listcomp>
stages = [stage.copy(extra) for stage in that.getStages()]
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 383, in copy
that._paramMap = {}
AttributeError: 'dict' object has no attribute '_paramMap'
自定义的Transformer代码如下:
class MyCustomTransformer(Transformer):
def _transform(self, input_df):
new_df = input_df.withColumn("newcol",
F.when(
F.when(F.col(col1) == F.lit("A"), F.col(W))
.otherwise(
F.when(F.col(col1) == F.lit("B"), F.col("X")))
.otherwise(...)
return new_df
有人可以告诉我我做错了什么或我想念什么吗?