Question

我正在尝试在名为RandomForestRegressor的数据框上训练train，如下所示：

rf = pyspark.ml.regression.RandomForestRegressor(featuresCol=self.featuresCol, labelCol=self.labelCol)
param_grid = ParamGridBuilder()\
    .addGrid(rf.numTrees, [5, 10, 20]) \
    .addGrid(rf.maxDepth, [5, 10, 15]) \
    .build()

crossval = CrossValidator(estimator=rf,
                          estimatorParamMaps=param_grid,
                          evaluator=RegressionEvaluator(),
                          numFolds=3)

self.model = crossval.fit(train)

以下是数据框中的行数，分区数，示例行和数据框架构：

Training on 26398 examples with 8 partitions
{'features': SparseVector(10479, {5: 1.0, 360: 1.0, 361: 0.2444, 362: -0.9697, 363: 1.0, 10476: -0.0685}),
 'label': 989}
root
 |-- features: vector (nullable = true)
 |-- label: long (nullable = true)

尝试拟合模型后的最终错误消息：

org.apache.spark.SparkException: Job 44 cancelled because SparkContext was shut down

导致此失败的原因是什么？

主

m4.xlarge
8 vCPU
16 GiB记忆

工人（4个实例）

r4.xlarge
4 vCPU
30.5 GiB记忆

为什么这个pyspark.ml.RandomForestRegressor因停止上下文而失败？

0 个答案: