使用BinaryClassificationEvaluator

时间:2019-04-28 13:41:45

标签: pyspark cross-validation recommendation-engine

我正在使用火花交叉验证来调整显式ALS模型的参数。评估者是BinaryClassificationEvaluator,其metricName ='areaUnderROC'来计算AUC。但这出错了。我的代码如下:

alsExplicit = ALS(  
        implicitPrefs=is_implicit,
        numItemBlocks=100,
        numUserBlocks=100,
        userCol='device_id',
        itemCol='item_id',
        ratingCol='rating',
    )
paramMapExplicit = ParamGridBuilder() \
        .addGrid(alsExplicit.rank, [30, 40]) \
        .addGrid(alsExplicit.maxIter, [10, 15]) \
        .addGrid(alsExplicit.regParam, [0.01, 0.1]) \
        .build()
evaluator_AUC = BinaryClassificationEvaluator(
        labelCol='rating',
        rawPredictionCol='prediction',
        metricName='areaUnderROC'
    )
cvExplicit = CrossValidator(estimator=alsExplicit, estimatorParamMaps=paramMapExplicit, evaluator=evaluator_AUC, numFolds=5) 
cvModelExplicit = cvExplicit.fit(train_data) # This lines goes Error

错误是:

pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Column prediction must be of type equal to one of the following types: [DoubleType, org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7] but was actually of type FloatType.'

当我将评估器更改为RegressionEvaluator时,它运行良好,如下所示:

evaluator_RMSE = RegressionEvaluator(
        metricName='rmse',
        labelCol='rating',
        predictionCol='prediction'
    )

并且,如果我训练一个具有固定参数的模型,然后使用该模型转换测试数据,然后使用BinaryClassificationEvaluator计算AUC,那么同样会出错。

model = als.fit(train_data)
pred = model.transform(test_data)
auc = evaluator_AUC.evaluate(pred)

然后我尝试手动更改类型:

pred = pred.withColumn('prediction', pred['prediction'].cast(DoubleType()))
auc = evaluator_AUC.evaluate(pred)

这种方式有效。

但是,当使用交叉验证时,我无法更改数据框的类型。我该怎么办?

0 个答案:

没有答案