Question

我有一个包含Spark 2.2和Mesos的集群。我需要在其上运行XGBoost（v0.7）的纯Scala版本（而不是Scala-Spark版本）。（这是因为我需要在没有Spark的机器上将Scala模型投入生产）。

我在Spark和数据帧上进行所有数据准备，然后将其collect插入驱动程序中的DMatrix。从我发现的代码可以在驱动程序上运行。

问题1 ：由于DMatrix是驱动程序而在驱动程序上运行是否正确？

XGBoost正在运行时，SparkUI不显示任何活动。 StdErr或StdOut中也没有打印输出，并且Mesos控制台不显示驱动程序上的任何活动。

然后在某个时候我得到StdErr打印输出，如：

[12:01:24] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 98 extra nodes, 0 pruned nodes, max_depth=6
[12:01:25] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 82 extra nodes, 0 pruned nodes, max_depth=6
[12:01:26] /xgboost/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=6

，此作业将继续计算指标和其他内容。因此，看来XGBoost正在某个地方但

问题2 ：在XGBoost运行时如何监视驱动程序活动？

问题3 ：我如何知道它使用了多少个内核？

问题4 ：在这种情况下，火花提交设置driver-cores重要吗？

在Spark集群上运行XGBoost Scala？

0 个答案: