我正在尝试在名为RandomForestRegressor
的数据框上训练train
,如下所示:
rf = pyspark.ml.regression.RandomForestRegressor(featuresCol=self.featuresCol, labelCol=self.labelCol)
param_grid = ParamGridBuilder()\
.addGrid(rf.numTrees, [5, 10, 20]) \
.addGrid(rf.maxDepth, [5, 10, 15]) \
.build()
crossval = CrossValidator(estimator=rf,
estimatorParamMaps=param_grid,
evaluator=RegressionEvaluator(),
numFolds=3)
self.model = crossval.fit(train)
以下是数据框中的行数,分区数,示例行和数据框架构:
Training on 26398 examples with 8 partitions
{'features': SparseVector(10479, {5: 1.0, 360: 1.0, 361: 0.2444, 362: -0.9697, 363: 1.0, 10476: -0.0685}),
'label': 989}
root
|-- features: vector (nullable = true)
|-- label: long (nullable = true)
尝试拟合模型后的最终错误消息:
org.apache.spark.SparkException: Job 44 cancelled because SparkContext was shut down
导致此失败的原因是什么?
主
工人(4个实例)