我目前用于评估LinearSVC的不同参数并获得最佳参数的方法:
tokenizer = Tokenizer(inputCol="Text", outputCol="words")
wordsData = tokenizer.transform(df)
hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures")
featurizedData = hashingTF.transform(wordsData)
idf = IDF(inputCol="rawFeatures", outputCol="features")
idfModel = idf.fit(featurizedData)
LSVC = LinearSVC()
rescaledData = idfModel.transform(featurizedData)
paramGrid = ParamGridBuilder()\
.addGrid(LSVC.maxIter, [1])\
.addGrid(LSVC.regParam, [0.001, 10.0])\
.build()
crossval = TrainValidationSplit(estimator=LSVC,
estimatorParamMaps=paramGrid,
evaluator=MulticlassClassificationEvaluator(metricName="weightedPrecision"),
testRatio=0.01)
cvModel = crossval.fit(rescaledData.select("KA", "features").selectExpr("KA as label", "features as features"))
bestModel = cvModel.bestModel
现在我想获取ML的基本参数(例如precision
,recall
等),如何获取这些参数?
答案 0 :(得分:0)
You can try this
from pyspark.mllib.evaluation import MulticlassMetrics
# Instantiate metrics object
metrics = MulticlassMetrics(predictionAndLabels)
# Overall statistics
precision = metrics.precision()
recall = metrics.recall()
f1Score = metrics.fMeasure()
print("Summary Stats")
print("Precision = %s" % precision)
print("Recall = %s" % recall)
print("F1 Score = %s" % f1Score)
您可以检查此链接以获取更多信息
https://spark.apache.org/docs/2.1.0/mllib-evaluation-metrics.html