Apache Toree的工作进展

时间:2016-06-29 12:41:45

标签: apache-toree

使用Apache Toree,可以在Spark上执行任意表达式。假设我们想执行一些SQL查询,例如:sqlContext.sql(..)

是否有可能获得此类SQL查询的进度(如在Zeppelin中)?也许Toree可以提供一些查询指标(如X tasks from N are done)?

1 个答案:

答案 0 :(得分:1)

Apache Zeppelin使用的方式是通过sc.dagScheduler。

如果无法直接访问SparkContext,

REST API应该是更好的选择。

package org.apache.zeppelin.spark

class SparkInterpreter {
  @Override
  public int getProgress(InterpreterContext context) {
    String jobGroup = getJobGroup(context);
    int completedTasks = 0;
    int totalTasks = 0;

    DAGScheduler scheduler = sc.dagScheduler();
    if (scheduler == null) {
      return 0;
    }
    HashSet<ActiveJob> jobs = scheduler.activeJobs();
    if (jobs == null || jobs.size() == 0) {
      return 0;
    }
    Iterator<ActiveJob> it = jobs.iterator();
    while (it.hasNext()) {
      ActiveJob job = it.next();
      String g = (String) job.properties().get("spark.jobGroup.id");
      if (jobGroup.equals(g)) {
        int[] progressInfo = null;
        try {
          Object finalStage = job.getClass().getMethod("finalStage").invoke(job);
          if (sparkVersion.getProgress1_0()) {
            progressInfo = getProgressFromStage_1_0x(sparkListener, finalStage);
          } else {
            progressInfo = getProgressFromStage_1_1x(sparkListener, finalStage);
          }
        } catch (IllegalAccessException | IllegalArgumentException
            | InvocationTargetException | NoSuchMethodException
            | SecurityException e) {
          logger.error("Can't get progress info", e);
          return 0;
        }
        totalTasks += progressInfo[0];
        completedTasks += progressInfo[1];
      }
    }

    if (totalTasks == 0) {
      return 0;
    }
    return completedTasks * 100 / totalTasks;
  }
}