Question

群集管理员：YARN

部署模式：无

我被告知如果部署模式设置为none，则驱动器进程的stdout位于根路径，而不是驱动程序进程的容器内部。

SparkUI日志：提供错误Container executed on lost node...

在进行此调用之前，我已将所有其他数据框/数据集取消隐藏，以确保它们不会缓存在内存中。

调用count()之类的简单操作会一直失败。

我基本上是在做以下事情：

columnNames.keys.foreach(
  col => {
    val nonNullColCount = 
      dataset.select(dataset(col)).filter(row => 
      row.getAs(col) != null).count()
    println(nonNullParamsCount)
  })

所以，我在循环中调用数据集count()。在每次迭代中，我从列名列表中选择一列。

错误是通用的，具有误导性，形式为：

Job aborted due to stage failure: Task 284 in stage 14.0 failed 4 times,
most recent failure: Lost task 284.3 in stage 14.0 (TID 100923, ip-172-31-50-226.ec2.internal, executor 266): 
ExecutorLostFailure (executor 266 exited caused by one of the running tasks) 
Reason: Container marked as failed: container_1506075842477_0672_01_017877 on host: ip-172-31-50-226.ec2.internal. 
Exit status: -100. 
Diagnostics: Container released on a *lost* node

Answer 1

如果您正在使用AWS Spot实例和现货实例取消价格变动，则可能会出现以下错误。

退出状态：-100。诊断：在丢失的节点上发布容器

解决方法将Spark作业拆分为许多独立的步骤，以便您可以保存每个步骤的结果作为S3在短时间间隔内的文件或与非现场实例一起使用。

在Spark数据帧上调用简单count（）失败

1 个答案: